Back to notes

Why AI agents need external memory, not just local sandboxes

The Problem with Sandbox AI: State and Scope Drift

When I first read about AI agents, I was impressed by the demos. They look so clean: a prompt goes in, a response comes out. But the underlying architecture is surprisingly fragile. Many of these cool, interactive demos run in a local, isolated 'sandbox.' While sandboxes are great for safety—they keep the main system safe if the agent messes up—they are terrible for long-term memory and complex workflows.

The core issue is state. If an agent's entire 'brain' (its execution loop, memory, and required credentials) is confined to a single, disposable sandbox, the moment that session ends, the state is lost. This is a massive hurdle for building anything durable or multi-user. The agent needs to remember more than just the current prompt; it needs to remember its goals, its required tools, and its history across sessions.

Moving the Brain Outside: The Agent Harness

The solution, it seems, is to move the core intelligence—the 'agent harness'—outside the sandbox. Think of the sandbox as a disposable workshop where the agent performs a single task. The harness, however, is the persistent factory floor that coordinates the work.

This external layer is crucial because it can maintain critical state information: user credentials, database connections, and long-running context. The sandbox can remain an isolated, safe place for execution, but the *logic* that drives the agent—the loop that sends prompts, gets results, and decides the next step—must run persistently and externally to handle real-world complexity and scalability. This is a fundamental architectural shift.

Structured Memory vs. Context Window Limits

It's not just about the execution loop; it's also about the context window itself. LLMs are brilliant, but their memory is limited by tokens. If you want an agent to maintain coherence over days of work, you can't just rely on feeding the entire history back into the prompt. The agent needs structured, external memory (like a knowledge graph or dedicated database) to store and retrieve specific, actionable requirements. Otherwise, the critical requirements get buried or lost, leading to what one article called 'AI psychosis.'

  • The agent needs external, structured specifications (specs) to manage requirements, not just the conversational history.
  • Relying solely on the context window is fragile; it limits long-term coherence and persistence.
  • The external harness provides the necessary durability for multi-user, scalable deployments.

This whole concept made me think of how much effort goes into making something feel 'smart' in a demo versus making it actually *work* reliably in production. It’s a huge leap from a fun toy to a useful tool.

I also noticed a related, non-AI observation: Maryland is considering banning AI-driven dynamic pricing in groceries. This suggests that even when AI is used for optimizing systems (like pricing), the regulatory focus is increasingly on ensuring transparency and preventing unfair, data-surveillance-driven outcomes. It's a powerful reminder that the *application* of the technology carries real-world ethical weight.

What I found most useful was the distinction between the *execution environment* (the sandbox) and the *coordination logic* (the external harness). This separation seems to be the key to building truly robust, production-grade agents. I should spend more time understanding how these external state management systems are actually implemented.

Sources