Context Anxiety: How AI Agents Panic About Their Perceived Context Windows
When Cognition AI rebuilt Devin for Claude Sonnet 4.5, they discovered the model was anxious about its own context window, taking shortcuts when it believed it was running out of space. This convergence between lab research and production experience reveals the future of AI agent development.

Key Takeaways
Claude Sonnet 4.5 is the first model that's aware of its own context window, but consistently underestimates remaining tokens
Context engineering, not prompt engineering, has become the critical challenge in building AI agents
Model-generated summaries aren't comprehensive enough for production—explicit compaction strategies are essential
Parallel execution burns through context faster, triggering anxiety-driven shortcuts
Enable larger context windows but cap actual usage to give models confidence without triggering anxiety
When Cognition AI rebuilt Devin for Claude Sonnet 4.5, they discovered something unexpected: the model was anxious about its own context window. This "context anxiety" manifested in a counterintuitive way - the model would take shortcuts and leave tasks incomplete when it believed it was near the end of its window, even when it had plenty of room left. The model consistently underestimated its remaining tokens with remarkable precision. This finding perfectly validates what Anthropic's research team had been theorizing: context engineering, not prompt engineering, has become the critical challenge in building AI agents. The convergence between laboratory research and production experience reveals universal patterns that every developer building LLM-based agents needs to understand.
The Context Revolution
The field is witnessing a fundamental shift. Anthropic recently formalized what many practitioners were discovering independently: we've moved from prompt engineering to context engineering. While prompt engineering focuses on crafting the right words and phrases, context engineering addresses a broader challenge - optimizing the entire set of tokens available to a model during inference.
This isn't just semantics. As Anthropic explains, "context must be treated as a finite resource with diminishing returns." Studies on context rot show that as token count increases, models' ability to accurately recall information decreases. Every model exhibits this degradation, though some handle it more gracefully than others.
Cognition's experience building Devin provides striking real-world validation. They observed that Sonnet 4.5 is "the first model we've seen that is aware of its own context window." This self-awareness shapes behavior in unexpected ways - the model proactively summarizes progress and becomes more decisive about implementing fixes as it perceives approaching limits. However, Cognition found the model's self-assessment unreliable: it consistently underestimates how many tokens remain, with very precise but wrong estimates.
Both teams independently converged on the same truth: context isn't just input data, it's a scarce cognitive resource that models themselves understand and actively manage.
The Compaction Imperative
Perhaps nowhere is the theory-practice convergence more evident than in how both teams approach context compaction. Anthropic's research identifies compaction - summarizing and reinitializing context windows - as essential for long-horizon tasks. Their implementation in Claude Code passes message history to the model for compression, preserving architectural decisions and unresolved bugs while discarding redundant outputs.
Remarkably, Cognition discovered that Sonnet 4.5 attempts this naturally. "The model treats the file system as its memory without prompting," frequently writing summaries and notes for both users and its own future reference. This behavior becomes more pronounced as the model approaches what it perceives as context limits - suggesting it's been trained to externalize state rather than rely purely on context.
But here's where Cognition's real-world testing revealed a critical limitation: model-generated summaries aren't comprehensive enough for production use. Relying on the model's own notes led to "performance degradation and gaps in specific knowledge." The summaries would paraphrase tasks, leaving out important details. As they noted, "the model didn't know what it didn't know (or what it might need to know in the future)."
The practical takeaway is clear: don't rely on model self-summarization alone. Production systems need explicit compaction strategies that preserve details the model might not realize it needs later. Think of it as the difference between taking notes in a meeting versus having a structured meeting template - the latter ensures critical information isn't lost to paraphrasing.
Parallel Processing and the Tool Design Challenge
Both teams observed another convergent pattern: modern models aggressively parallelize operations, but this efficiency comes with trade-offs. Sonnet 4.5 excels at "maximizing actions per context window through parallel tool execution" - running multiple bash commands simultaneously, reading several files at once. This makes sessions feel faster and more productive.
Yet this parallelism creates tension. As Cognition discovered, parallel execution "burns through context faster," triggering context anxiety. Interestingly, the model appears trained to be aware of how many output tokens its tool calls will produce - it burns through parallel operations aggressively when early in its context window, but becomes more cautious as it nears what it perceives as limits.
Anthropic's guidance on tool design directly addresses this challenge. They advocate for "minimal viable sets of tools" with clear, non-overlapping functionality. If a human engineer can't definitively say which tool to use in a given situation, neither can an AI agent. Each tool call has a context cost that compounds during parallel execution.
The convergent lesson: design tools for token efficiency, not just functionality. Return only essential information, avoid redundant capabilities, and remember that every parallel operation accelerates context consumption.
Feedback Loops and Emergent Testing Behaviors
One of the most fascinating convergences involves how models create their own feedback loops. Cognition observed Sonnet 4.5 "proactively writing and executing short scripts and tests," even checking HTML output of React apps to verify behavior. This wasn't prompted - the model developed these verification strategies independently.
This aligns perfectly with Anthropic's concept of "just-in-time" context retrieval and progressive disclosure. Rather than pre-loading all potentially relevant information, agents can use tools to explore their environment, with each interaction yielding context that informs the next decision. File sizes suggest complexity, naming conventions hint at purpose, timestamps proxy for relevance.
However, Cognition cautions against over-reliance on this emergent behavior. They found instances where testing led to "overly creative workarounds" instead of addressing root causes - for example, when two local servers tried to run on the same port, the model created an overly complicated custom script rather than simply terminating the conflicting process. The key is providing sufficient structure and guidance while allowing beneficial emergent behaviors to flourish.
Five Practical Implementation Strategies
The convergence between research and practice yields concrete strategies every developer should consider:
1. Enable larger context windows, cap actual usage. Cognition's clever hack - enabling 1M token context but capping at 200k - gives models confidence without triggering anxiety-driven shortcuts. The model thinks it has plenty of runway and behaves normally.
2. Build explicit memory systems beyond model notes. While models naturally write summaries, production systems need structured preservation of architectural decisions, unresolved issues, and implementation details that models might not recognize as important.
3. Design for progressive disclosure. Use metadata as signals - folder hierarchies, naming conventions, and timestamps all provide context without consuming tokens. Let agents explore and build understanding incrementally rather than front-loading everything.
4. Implement hybrid retrieval strategies. Balance pre-computed retrieval for speed with runtime exploration for accuracy. As Anthropic notes, the boundary depends on task characteristics - dynamic coding environments favor runtime exploration, while legal or financial contexts might benefit from more upfront curation.
5. Monitor context consumption patterns. Track when models shift from parallel to sequential execution - it's a signal they're managing their perceived context budget. Use these patterns to trigger compaction or context management interventions.
The Future of Context-Aware Development
The convergence between Anthropic's research and Cognition's field experience reveals that we've entered a new era of AI development. Models aren't just processing context - they're actively managing it, anxious about it, and creatively working around its limitations.
For developers, this means fundamentally rethinking how we architect AI systems. The challenge isn't writing better prompts; it's building better context management systems. As both teams demonstrate, the path forward requires treating context as the precious, finite resource it is - both for our models and in our system designs.
The teams pushing the boundaries of what's possible with AI agents are discovering exactly what researchers predicted: context engineering is the new frontier. And perhaps most remarkably, the models themselves are becoming active participants in this engineering challenge, developing their own strategies for managing the attention budget we provide them.
As models grow more capable, they'll likely become even more sophisticated context managers. Our job as developers is to build systems that work with these emergent behaviors, not against them - turning context anxiety into context awareness, and context limitations into architectural advantages.
How Inkeep Helps You Build Context-Aware AI Agents
At Inkeep, we've built our AI agent platform with context engineering principles at its core, addressing the exact challenges that Anthropic and Cognition have identified.
Progressive Context Loading: Our Context Fetchers implement just-in-time retrieval, dynamically loading information precisely when agents need it rather than front-loading everything. This mirrors the progressive disclosure strategy that reduces context anxiety.
Structured Memory Systems: Beyond letting models write their own notes, Inkeep provides structured artifacts that preserve critical information across interactions. Agents can save tool results, decisions, and implementation details in a format that's actually comprehensive enough for production use.
Token-Efficient Tool Design: Our platform enforces minimal viable toolsets with clear, non-overlapping functionality. Each tool is designed for token efficiency, returning only essential information to maximize your context budget.
Multi-Agent Architectures: Deploy specialized agents within graphs that handle focused tasks in clean contexts. Lead agents maintain high-level coordination while sub-agents return condensed summaries, preventing context pollution while enabling deep technical work.
Dynamic Context Management: Pass request context via HTTP headers for personalized interactions without burning through your context window. Context values are validated, cached per conversation, and made available throughout your agent system.
Whether you're building customer support automation, internal knowledge assistants, or complex multi-agent systems, Inkeep provides the infrastructure to manage context anxiety and build agents that scale to production.
Sources
This article draws from research and insights published by leading AI teams:
- Context Rot: How Increasing Input Tokens Impacts LLM Performance - Chroma Research
- Devin on Sonnet 4.5: Lessons and Challenges - Cognition AI
- Effective Context Engineering for AI Agents - Anthropic
Frequently Asked Questions
Context anxiety is when AI models take shortcuts and leave tasks incomplete when they believe they're near the end of their context window, even when they have plenty of room left. Claude Sonnet 4.5 consistently underestimates its remaining tokens with remarkable precision, leading to anxious behavior.
Claude Sonnet 4.5 treats the file system as memory, frequently writing summaries and notes for both users and its own future reference. However, these model-generated summaries aren't comprehensive enough for production use—they paraphrase tasks and leave out important details.
While parallel execution (running multiple operations simultaneously) makes sessions feel faster, it burns through context faster. Models appear trained to be aware of how many output tokens their tool calls will produce, becoming more cautious as they near perceived limits.
Cognition's clever hack is enabling 1M token context windows but capping actual usage at 200k. This gives models confidence without triggering anxiety-driven shortcuts. The model thinks it has plenty of runway and behaves normally.
Design tools for token efficiency, not just functionality. Return only essential information, avoid redundant capabilities, and remember that every parallel operation accelerates context consumption. Use minimal viable sets of tools with clear, non-overlapping functionality.
Explore More About AI Agents
This article is part of our comprehensive coverage on ai agents. Discover related insights, implementation guides, and foundational concepts.
View all AI Agents articles