Optimizing Agent Performance: Context Caching with Gemini in the Google ADK
Learn how to leverage context caching in the Google Agent Development Kit (ADK) with Gemini 2.0+ models to significantly reduce latency and costs for token-heavy agent interactions.
Posted on: 2026-02-28 by AI Assistant

Introduction
As AI agents take on increasingly complex tasks, they often need to process large volumes of data—such as extensive instruction sets, entire codebases, or massive reference documents. Resending this massive context for every single interaction with a generative AI model is not only slow and inefficient, but it can also become prohibitively expensive.
To solve this, the Google Agent Development Kit (ADK) introduces Context Caching for models that support it (like Gemini 2.0 and higher). This feature allows your application to store and reuse large request data across multiple agent turns, drastically reducing latency and token costs.
How Context Caching Works in the ADK
In the ADK, context caching is configured at the App level, which acts as the wrapper for your agents. By applying the configuration to the App, the caching behavior governs all agents running within that specific application instance.
This is managed using the ContextCacheConfig class. Let’s look at how to set this up in Python.
Configuring the Cache
To enable caching, you simply pass a ContextCacheConfig object when instantiating your App.
from google.adk import Agent
from google.adk.apps.app import App
from google.adk.agents.context_cache_config import ContextCacheConfig
# 1. Define your root agent using a Gemini 2.0+ model
root_agent = Agent(
model="gemini-2.0-flash",
name="data_analyst_agent",
description="Analyzes large datasets.",
instruction="You are an expert data analyst. Use the provided context to answer questions."
)
# 2. Create the app and apply the caching configuration
app = App(
name='my-caching-agent-app',
root_agent=root_agent,
context_cache_config=ContextCacheConfig(
min_tokens=2048, # Minimum tokens to trigger caching
ttl_seconds=600, # Store the cache for up to 10 minutes
cache_intervals=5, # Refresh the cache after 5 uses
),
)
Understanding the Configuration Settings
The ContextCacheConfig class provides fine-grained control over when and how your context is cached. Here is a breakdown of the key parameters:
min_tokens(int): The minimum number of tokens required in a request before caching is triggered. Caching adds a slight overhead, so it’s inefficient to cache very small, transient requests. Setting a threshold (like2048tokens) ensures you only cache when there’s a clear performance benefit. The default is0.ttl_seconds(int): The Time-To-Live for the cached context, measured in seconds. This determines how long the AI model provider will keep the context warm. In the example above,600means the cache expires after 10 minutes. The default is1800(30 minutes).cache_intervals(int): The maximum number of times the cached content can be reused before it is forced to refresh, regardless of the TTL. This is useful if you expect the underlying context to change periodically but still want to batch reads. The default is10.
Why This Matters
By intelligently applying context caching, developers can build agents that:
- Respond Faster: Eliminating the need to re-process large preamble instructions or data dumps significantly cuts down on Time-To-First-Token (TTFT).
- Cost Less: You pay for caching the context once, and subsequent interactions only cost a fraction of the standard input token price.
- Handle More Complexity: Agents can be given massive “runbooks” or system manuals without penalizing the user experience during every back-and-forth interaction.
Next Steps
If you want to see context caching in action and analyze its performance impact, check out the cache_analysis sample in the official ADK Python repository.
By mastering context caching, you can ensure your ADK agents remain both blazing fast and cost-effective as they scale to handle enterprise-level workloads.