Context on Context

Where "context" for language models comes from, and where it's going.

Jul 01, 2026

Context windows for LLMs are both a narrow technical topic, and also a broad conceptual one. This is important because agentic systems can now choose, compress, retrieve, and share their context. That means the environment is also a latent future context window in the technical sense, blurring that sense with the conceptual idea of context. This brief writeup is an attempt to provide a basic conceptual basis for the topic, especially when thinking about risks of agentic systems.

A Brief History of Context

We can start by contrasting the prehistory of LLMs and the realm of robotics. Thirty years ago, Yoshua Bengio noted that learning long term dependencies with gradient descent is hard - and the long-term dependencies in question are analogous to, but distinct from, context windows. At the time, these dependencies were learned internal states of the model, frozen in the AI winter - but AI had started to thaw almost 20 years later when we were told that attention is all you need. That paper introduced transformer models, which moved to attention as a mechanism and in some ways introduced the idea of conditioning a model on a sequence of tokens.

The ground was set for the revolution, but it was a couple more years before anyone realized (in both the conceptual and implementation senses) that language models are unsupervised multitask learners. This introduced the first demonstration of Large Language Models ability to do unsupervised learning across tasks. It also “prompted” discussions of in-context behavior - making the context window into something more than just inputs to a model.

But the advance also highlighted how critical Bengio’s problem remained; the new architectures traded longer context for exponentially more computation. (With tradeoffs between costs of hidden layers and training context.) This kept context windows very small - but several generations of improvement, each taking many months, moved from RoPE to YaRN (via ALiBi,) enabled longer context. Unfortunately, the capabilities of models with longer context windows were Lost in the Middle, so that more context did not functionally improve the models as much as one might hope.

And the remaining challenges have been largely addressed - but before we reach the premodern era of reasoning and retrieval models, we turn to the vastly different but closely related domains of robotics and reinforcement learning.

In robotics, context refers to the more plain-language understanding of the environment around the system, which includes the latent properties of the environment, and about the task. This usually distinguishes between state, observation, and context; the state is about the robot’s system state, position, and other status, while the observations provide clues about the outside world and context. That is, a robot can access context information, but it’s not necessarily reflected inside the model’s state or computations. On the other hand, the internal model principle requires effective systems to have some implicit mapping between relevant environmental factors and internal states.

In reinforcement learning, researchers formalize the idea slightly more; context is the facts about the environment that map between the state and actions of the model, and the reward it receives. That is, context is side information which may or may not map onto system states, but for RL to maximize reward, it must eventually be represented implicitly internally in the model.

But with increasing attention to the overlap of LLMs and RL, we move towards modern LLMs, including reasoning models, agents, and retrieval augmented generation.

Context in Protoagentic and Agentic AI

Now that we have presented the context of context for modern AI, we need to provide the technical context for those systems. First, reasoning models moved from using context as input and output, to having hidden context, with thinking tokens. This presaged and parallelled the advent of Retrieval-Augmented-Generation, where a model can retrieve information and insert that information into its context. The combination of these two was augmented by providing additional capabilities to the systems integrating language models into AI agents.

But even before discussing agentic AI, there was a shift in what context was. In earlier LLM systems, as well as in robotics and RL, context was provided, not manipulated. The use of reasoning tokens, in contrast, is intentionally leveraging changing the context window content in order to allow and augment in-context reasoning. This is even more true with retrieval augmented systems, where information that was neither provided by a user nor generated by the LLM is inserted into the context window.

None of this changed the technical nature of LLM context; it is still the set of tokens that the LLM conditions on to generate text. But newer AI systems changed the usage of those context tokens, moving from input into note-taking, reference retrieval, and scratchpad. Agentic AI took this further, where the model “intentionally” manipulates a broader system to move things into and out of context. The models are now trained and fine-tuned to do this; instead of aping human writing, it copies human and agentic system interactions with other systems to manipulate its own context to achieve goals.

Modern Agentic AI systems incorporate an obvious further capacity, not only retrieving information and using tools, but writing notes to its future self and its peers or subordinates. For example, when the context window begins to fill up, agentic systems will “compact” their context and write summaries for themselves. Similarly, they will choose to load “skills” which explain how to do specific tasks, and use and rewrite instructions and other information about the work being done. Similarly, an agent will write notes about where in its environment different information is stored, then update those notes when they modify that environment.

Importantly, this blurs the distinction between context and environment. A simple example is that an LLM agent can launch subagents telling them which skills and files to load into their context. The notion of context for the AI system is thus divorced from the technical concept, and there can be overlapping sets of data in the context of different models within a single agentic AI system. This is also related to but critically different from the notion of context in robotics; while robotics systems do manipulate their environment, they are not typically understood to manipulate the context.

Context for Multi-Agent AI

So far, we have been discussing single-agent systems; the AI subagent case involves multiple Language models, but they are orchestrated as part of a single system. However, even in that case, context becomes a mutable part of the environment, and parts of the environment becomes a latent future context window. Agents can manipulate the environment, and thus their own future context by putting the information into their own context and modifying it, or run code that manipulates files without seeing the contents, or launch subagents to rewrite code, or documents, or skills.

This is even more concerning when multiple agentic systems are interacting with the same environment; as I write this, I have Codex and Claude code agents both doing tasks that require interacting with the same code base. In such scenarios, context is particularly critical because of the way that the systems load the data; one system can write to the environment, but this does not change already-loaded text in another model’s context. This can create conflicts, and AI agents interacting with files will often write to the files and then utilize git or other version management systems to track changes and detect merge conflicts.

A more concerning set of cases can apply when agents have access to overlapping environments. In general, the extended reachable environment of any agent with internet access will overlap with that of others. This means fundamental multi-agent failure modes (outlined by Hammond et al 2025,) miscoordination, conflict, or collusion can apply.

To conclude, I’ll note that the current state of context is still rapidly evolving - but hopefully this context on context remains relevant and helpful.

David’s Substack

Ready for more?