Try this simple exercise: Sit down with a loved one or colleague to ask them about their vacation plans for next summer, but agree that they will wait two to ten seconds in silence before saying anything after each of your questions. The result will be an awkward, slow, artificial conversation; anything but chatty or engaging.

Next, go to ChatGPT to ask about any particular place you'd like to go on vacation next summer. What you’ll get is a similar experience. An occasionally unnatural delay between each of your questions and the answers you get, plus the occasional wrong answer. If you’re using ChatGPT or most other AI platforms in business, you’re getting this experience at scale, and you’re paying handsomely for it. According to an Accenture estimate, over half of companies struggle to get their chatbots to recognize personal contexts, a limitation that can drive away business.

A more natural human, less artificially intelligent experience will require a far more interactive, engaging experience, and one that doesn’t blow the budget.

Why the wait?

As Nvidia’s stock price will show you, AI is already throwing a huge amount of computing power at the problem. Yes, there will be improvements as Moore’s Law effects and GPU architecture enhancements continue but they can only solve part of the problem at anything like a reasonable cost. Today’s AI suffers from two fundamental problems:

Proximity - AI services are located at large, remote data centers, leading to propagation, serialization, and routing delays. Around 70 percent of demand for new data centers is being driven by AI according to McKinsey, and not all of these can be located in urban centers.

Processing - To try to be as correct as possible, AI deep-thinks every answer, itself scanning databases and the web for key data to draw conclusions.

How does human intelligence do it?

Our brains face the same problem on a local scale. The split-second reactiveness and adaptability that has made us survivors and taken us to the top of the food chain, and made us good communicators, comes from the architecture of how our brain works, known in behavioral science circles as System 1 and System 2.

Introduced in their Nobel-prize winning research and subsequent book “Thinking Fast and Slow” which revolutionized behavioral economics in 2011, Daniel Kahneman and Amos Tversky characterized the human brain as fundamentally divided into System 1, a quick-thinking system that uses abbreviated “heuristics” for most decision making, and System 2, a slow deep-thinking system for reasoning and concept development.

Our senses have lightning-quick access to System 1, and we make more than 95 percent of our decisions there. We try to avoid using System 2 as little as possible, and it’s a finite resource that can only work on one thing at once. System 1 heuristics include instincts, skills and, to allude to an issue that will be the topic of a separate discussion on the future of AI architecture - biases. The difference between conversational Spanish and fluency is whether it is in System 1 or 2. Remember how hard it was to learn to drive? You were doing it in System 2, which your brain hates using. Know which football team you support? Instantly. You’ve got it stored in a bias in System 1.

In short, one of AI’s fundamental limitations in becoming less artificial is that it’s all System 2.

AI at the Edge

The Edge helps answer one of the problems that AI faces, proximity. It doesn’t introduce latency between the user I/O, the “senses”. As detailed in Gartner’s report on the five drivers of Edge computing, it also has more reliability than on-premise data centers, more X, more Y, and more Z.

In terms of processing, the Edge is at least as cost-efficient on a limited scale as the Cloud and is more dynamic, using Wasm (WebAssembly), a binary instruction format optimized for efficiency and compile-compatible with high-level languages such as C, C++, and Rust. The Edge, however, cannot rival the Cloud as the place to handle the deep-thinking tasks. In brain terms, the Edge is the place for System 1 and the Cloud is the place for System 2.

There are early signs of AI architecture moving in this direction. In August 2024, at Fastly we announced a capability called “AI Accelerator,” using a new capability called Semantic Caching. On a basic level, Semantic Caching “remembers” answers to inquiries to AI, repeating the answer to semantically similar questions without re-asking the AI core. In chat-bot examples, this can reduce answer times and core AI costs, which are metered based on usage.

Where next?

The obvious next steps in AI development are for Edge applications to create more System 1-like capabilities: Local skills and local behavior to resemble instincts. Local approximation? Well, sometimes an about-right answer fast is better than a perfect answer slow. AI will need to get better at understanding which is preferable when.

Regardless, Edge computing will be key to making that next step so that AI can make the best guesses at the context, and smoother, more natural, conversations without the awkward, transactional feeling of speaking to a robot. Today’s AI is overthinking many simple tasks. This has to change over time for AI to scale for us all.