Blog

Generating Embeddings with Pydantic AI

A comprehensive guide on how to generate embeddings using Pydantic AI across multiple providers like OpenAI, Google, Cohere, Bedrock, and VoyageAI.

Posted on: 2026-03-05 by AI Assistant


Embeddings are vector representations of text that capture semantic meaning. They’re essential for building semantic search, Retrieval-Augmented Generation (RAG) systems, similarity detection, and classification. Pydantic AI provides a unified interface for generating embeddings across multiple providers.

In this post, we’ll explore how to use the Embedder class in Pydantic AI.

Quick Start

The Embedder class is the high-level interface for generating embeddings. You can start by embedding a search query or multiple documents:

from pydantic_ai import Embedder
import asyncio

embedder = Embedder('openai:text-embedding-3-small')

async def main():
    # Embed a search query
    result = await embedder.embed_query('What is machine learning?')
    print(f'Embedding dimensions: {len(result.embeddings[0])}')
    
    # Embed multiple documents at once
    docs = [
        'Machine learning is a subset of AI.',
        'Deep learning uses neural networks.',
        'Python is a programming language.',
    ]
    result = await embedder.embed_documents(docs)
    print(f'Embedded {len(result.embeddings)} documents')

asyncio.run(main())

Tip: Some embedding models optimize differently for queries and documents. Use embed_query() for search queries and embed_documents() for content you’re indexing.

Exploring the Embedding Result

All embed methods return an EmbeddingResult containing the embeddings along with useful metadata like token usage and cost.

from pydantic_ai import Embedder
import asyncio

embedder = Embedder('openai:text-embedding-3-small')

async def main():
    result = await embedder.embed_query('Hello world')

    # Access embeddings - each is a sequence of floats
    embedding = result.embeddings[0]
    print(f'Dimensions: {len(embedding)}')

    # Check usage
    print(f'Tokens used: {result.usage.input_tokens}')

asyncio.run(main())

Supported Providers

Pydantic AI supports several leading embedding providers.

1. OpenAI and Compatible Models

Pydantic AI seamlessly works with OpenAI’s embeddings API and any OpenAI-compatible provider. You can install the required packages with pip install "pydantic-ai-slim[openai]".

from pydantic_ai import Embedder
import asyncio

embedder = Embedder('openai:text-embedding-3-small')

async def main():
    result = await embedder.embed_query('Hello world')
    print(len(result.embeddings[0]))

asyncio.run(main())

OpenAI’s text-embedding-3-* models also support dimension reduction:

from pydantic_ai.embeddings import EmbeddingSettings
embedder = Embedder(
    'openai:text-embedding-3-small',
    settings=EmbeddingSettings(dimensions=256),
)

2. Google (Gemini and Vertex AI)

Google’s embedding models can be accessed via the Gemini API (Google AI Studio) or Vertex AI (pip install "pydantic-ai-slim[google]").

embedder = Embedder('google-gla:gemini-embedding-001')

3. Cohere

Cohere offers excellent multilingual models (pip install "pydantic-ai-slim[cohere]"). They support extra settings such as controlling truncation for long inputs.

from pydantic_ai.embeddings.cohere import CohereEmbeddingSettings

embedder = Embedder(
    'cohere:embed-v4.0',
    settings=CohereEmbeddingSettings(
        dimensions=512,
        cohere_truncate='END',
    ),
)

4. VoyageAI

VoyageAI provides specialized models for code, finance, and legal domains (pip install "pydantic-ai-slim[voyageai]").

5. AWS Bedrock

With Bedrock (pip install "pydantic-ai-slim[bedrock]"), you can access Amazon Titan, Cohere, and Amazon Nova models.

6. Local Embeddings with Sentence Transformers

For offline use, privacy, or avoiding API costs, you can run embeddings locally using the sentence-transformers library (pip install "pydantic-ai-slim[sentence-transformers]").

embedder = Embedder('sentence-transformers:all-MiniLM-L6-v2')

Settings and Token Counting

The EmbeddingSettings object provides common configurations like dimensions and truncate.

You can also check token counts before making a request to avoid exceeding limits:

from pydantic_ai import Embedder
import asyncio

embedder = Embedder('openai:text-embedding-3-small')

async def main():
    token_count = await embedder.count_tokens('Hello world, this is a test.')
    print(f'Tokens: {token_count}')
    
    max_tokens = await embedder.max_input_tokens()
    print(f'Max tokens: {max_tokens}')

asyncio.run(main())

Conclusion

Whether you need a cutting-edge proprietary model or want to run an embedding model locally for privacy, Pydantic AI’s unified Embedder interface makes it simple to integrate semantic understanding into your application.