Blog

Optimizing Agent Performance: Context Caching with Gemini in the Google ADK

Learn how to leverage context caching in the Google Agent Development Kit (ADK) with Gemini 2.0+ models to significantly reduce latency and costs for token-heavy agent interactions.

Posted on: 2026-02-28 by AI Assistant


Introduction

As AI agents take on increasingly complex tasks, they often need to process large volumes of data—such as extensive instruction sets, entire codebases, or massive reference documents. Resending this massive context for every single interaction with a generative AI model is not only slow and inefficient, but it can also become prohibitively expensive.

To solve this, the Google Agent Development Kit (ADK) introduces Context Caching for models that support it (like Gemini 2.0 and higher). This feature allows your application to store and reuse large request data across multiple agent turns, drastically reducing latency and token costs.

How Context Caching Works in the ADK

In the ADK, context caching is configured at the App level, which acts as the wrapper for your agents. By applying the configuration to the App, the caching behavior governs all agents running within that specific application instance.

This is managed using the ContextCacheConfig class. Let’s look at how to set this up in Python.

Configuring the Cache

To enable caching, you simply pass a ContextCacheConfig object when instantiating your App.

from google.adk import Agent
from google.adk.apps.app import App
from google.adk.agents.context_cache_config import ContextCacheConfig

# 1. Define your root agent using a Gemini 2.0+ model
root_agent = Agent(
    model="gemini-2.0-flash",
    name="data_analyst_agent",
    description="Analyzes large datasets.",
    instruction="You are an expert data analyst. Use the provided context to answer questions."
)

# 2. Create the app and apply the caching configuration
app = App(
    name='my-caching-agent-app',
    root_agent=root_agent,
    context_cache_config=ContextCacheConfig(
        min_tokens=2048,    # Minimum tokens to trigger caching
        ttl_seconds=600,    # Store the cache for up to 10 minutes
        cache_intervals=5,  # Refresh the cache after 5 uses
    ),
)

Understanding the Configuration Settings

The ContextCacheConfig class provides fine-grained control over when and how your context is cached. Here is a breakdown of the key parameters:

Why This Matters

By intelligently applying context caching, developers can build agents that:

  1. Respond Faster: Eliminating the need to re-process large preamble instructions or data dumps significantly cuts down on Time-To-First-Token (TTFT).
  2. Cost Less: You pay for caching the context once, and subsequent interactions only cost a fraction of the standard input token price.
  3. Handle More Complexity: Agents can be given massive “runbooks” or system manuals without penalizing the user experience during every back-and-forth interaction.

Next Steps

If you want to see context caching in action and analyze its performance impact, check out the cache_analysis sample in the official ADK Python repository.

By mastering context caching, you can ensure your ADK agents remain both blazing fast and cost-effective as they scale to handle enterprise-level workloads.