Efficient Memory Management: Context Compaction in the Google ADK
Discover how Context Compaction in the Google Agent Development Kit (ADK) helps manage long-running agent sessions by intelligently summarizing history to maintain performance and accuracy.
Posted on: 2026-02-28 by AI Assistant

Introduction
As AI agents engage in longer, more complex workflows, the history of their interactions—the “context”—can grow significantly. While large context windows in modern LLMs like Gemini are impressive, processing thousands of historical tokens for every new request can eventually lead to increased latency, higher costs, and a “noisy” context that might degrade the agent’s focus.
To address this, the Google Agent Development Kit (ADK) provides a powerful feature called Context Compaction. This mechanism ensures that your agents remain efficient and performant, even during multi-turn sessions that span dozens of interactions.
What is Context Compaction?
Context Compaction is a memory management technique that reduces the number of workflow events passed to the LLM by summarizing older interactions. Instead of simply truncating the history (which would cause the agent to “forget” earlier steps), the ADK uses a sliding window approach to condense past events into a concise summary.
This summary is then prepended to the active context, allowing the agent to retain the essential details of the past while keeping the immediate context window focused on the most recent and relevant events.
How it Works: The Sliding Window
The ADK manages context using two primary parameters:
- Compaction Interval: The number of events to keep in their full, raw form. Once the history exceeds this limit, a compaction cycle is triggered.
- Overlap Size: The number of recent events that are excluded from the summary to ensure the agent maintains immediate “short-term memory” continuity.
For example, if you have a compaction interval of 20 and an overlap of 5, the ADK will wait until there are more than 20 events. It will then take all events except the most recent 5 and summarize them.
Configuring Compaction in the ADK
In the ADK, you configure compaction via the EventsCompactionConfig when setting up your application.
const app = new App({
// ... other config
eventsCompactionConfig: {
compactionInterval: 20,
overlapSize: 5,
},
});
By default, the ADK uses a standard summarization prompt, but developers have full control over how this compression happens.
Customizing Summarization
For complex domains—such as legal analysis or specialized coding tasks—you might want a specific summarization style. The ADK allows you to define a custom LlmEventSummarizer.
You can specify:
- The Model: Use a smaller, faster model (like Gemini Flash) specifically for the summarization task to save costs.
- The Prompt: Provide a custom template that instructs the LLM on which details are critical to preserve (e.g., “always keep track of the user’s specific budget constraints”).
const customSummarizer = new LlmEventSummarizer({
model: 'gemini-1.5-flash',
promptTemplate: 'Summarize the following interaction history, focusing on the user goals and technical constraints: {{events}}',
});
const app = new App({
// ...
eventsCompactionConfig: {
compactionInterval: 15,
overlapSize: 3,
summarizer: customSummarizer,
},
});
Benefits of Context Compaction
- Reduced Latency: Smaller prompts result in faster Time-To-First-Token (TTFT).
- Cost Efficiency: By reducing the total token count in each request, you significantly lower operational costs.
- Maintained Accuracy: Unlike simple truncation, compaction preserves the “semantic essence” of the entire conversation.
- Infinite Workflows: Enables agents to run for hundreds of turns without hitting model context limits.
Conclusion
Context Compaction is an essential tool for any developer building production-grade agents with the Google ADK. By balancing “long-term memory” via summarization with “short-term focus” via the sliding window, you can create agents that are both highly capable and remarkably efficient.
Whether you are building a persistent coding assistant or a long-running research agent, mastering context compaction will ensure your AI remains sharp and responsive from the first message to the last.