Designing a Production-Ready RAG System with Flutter, Dart, Chroma, and dartantic_ai

Learn how to build a highly intelligent Retrieval-Augmented Generation (RAG) application in Flutter using Dart, the Gemini API, and a vector database for semantic search.

Published on • 2026-03-27

AI Assistant

Building a Retrieval-Augmented Generation (RAG) system is no longer just about connecting an LLM to a database. In production, it becomes a question of architecture—how components communicate, scale, and remain secure.

This guide walks through a full project structure for a modern AI application using:

Flutter (frontend)
Dart backend with dartantic_ai (orchestration layer)
Chroma (vector storage)
Gemini (LLM and embeddings)

The objective is to design a system that is clean, scalable, and maintainable.

System Architecture Overview

A production-ready RAG system should be layered:

[ Flutter App ]
        ↓
[ Dart Backend (API + Agents) ]
        ↓
[ Chroma (Vector DB) ]
        ↓
[ LLM Provider (Gemini) ]

Why this matters

Security: API keys remain on the backend
Performance: centralized caching and retrieval
Flexibility: components can be replaced independently

Project Structure (Monorepo Style)

A well-organized repository improves long-term maintainability:

rag-flutter-app/
│
├── apps/
│   ├── mobile_app/        # Flutter application
│   └── backend/           # Dart backend (API + RAG logic)
│
├── packages/
│   ├── rag_core/          # Shared RAG logic (optional)
│   └── models/            # Data models (DTOs)
│
├── infra/
│   ├── chroma/            # Chroma setup (Docker / scripts)
│   └── scripts/           # indexing / ingestion scripts
│
└── README.md

Flutter Application Structure

apps/mobile_app/
├── lib/
│   ├── features/
│   │   └── chat/
│   │       ├── chat_screen.dart
│   │       ├── chat_controller.dart
│   │       └── chat_service.dart
│   │
│   ├── core/
│   │   ├── api_client.dart
│   │   └── config.dart
│   │
│   └── main.dart

Responsibilities

Rendering user interface
Handling user input
Calling backend APIs

Example API Call

Future<String> askQuestion(String query) async {
  final response = await http.post(
    Uri.parse('$baseUrl/ask'),
    body: jsonEncode({"query": query}),
  );

  return jsonDecode(response.body)['answer'];
}

The mobile app should not handle embeddings, vector search, or API keys.

Dart Backend (RAG Core)

apps/backend/
├── lib/
│   ├── main.dart
│   ├── routes/
│   │   └── rag_routes.dart
│   │
│   ├── services/
│   │   ├── rag_service.dart
│   │   ├── embedding_service.dart
│   │   ├── vector_service.dart
│   │   └── llm_service.dart
│   │
│   ├── agents/
│   │   └── rag_agent.dart
│   │
│   └── config/
│       └── env.dart

Core RAG Service

Future<String> askQuestion(String query) async {
  final embedding = await embeddingService.embed(query);
  final chunks = await vectorService.search(embedding, topK: 5);
  final prompt = buildPrompt(query, chunks);
  return await llmService.generate(prompt);
}

Agent Layer with dartantic_ai

The agent orchestrates the workflow:

final agent = RagAgent(
  embeddingService,
  vectorService,
  llmService,
);

final answer = await agent.run(query);

This abstraction allows extensibility for tools, memory, and more advanced workflows.

Vector Layer with Chroma

Run Chroma as a service:

docker run -p 8000:8000 chromadb/chroma

Vector Service Example

Future<List<String>> search(List<double> embedding, {int topK = 5}) async {
  final response = await http.post(
    Uri.parse('$chromaUrl/api/query'),
    body: jsonEncode({
      "query_embeddings": [embedding],
      "n_results": topK
    }),
  );

  return parseChunks(response);
}

Data Ingestion Pipeline

RAG requires preprocessing before it can function effectively:

infra/scripts/
├── ingest.dart
├── chunker.dart
└── embedder.dart

Pipeline Flow

Documents → Chunking → Embeddings → Chroma

Recommended Strategy

Chunk size: 300–800 tokens
Overlap: 10–20%

Configuration and Secrets

GEMINI_API_KEY=your_key
CHROMA_URL=http://localhost:8000

Sensitive data must remain on the backend.

Optional Shared Package

packages/rag_core/

This package can include:

Prompt templates
Shared utilities
Common logic

Deployment Strategy

Backend

Containerize with Docker
Deploy to platforms such as Cloud Run, Fly.io, or AWS ECS

Chroma

Run as a container
Attach persistent storage

Flutter

Build for iOS and Android

Scaling Considerations

To support growth:

Introduce caching (embeddings and responses)
Improve retrieval with hybrid search
Add re-ranking mechanisms
Monitor accuracy and system performance

Common Mistakes

Placing vector database logic in Flutter
Skipping document chunking
Passing excessive context to the LLM
Exposing API keys
Treating RAG as a single function rather than a pipeline

Conclusion

A well-designed RAG system is not a single feature but a structured pipeline.

By combining:

Flutter for user experience
Dart with dartantic_ai for orchestration
Chroma for retrieval
Gemini for generation

you can build applications that are not only AI-powered but also grounded in real data and ready for production use.

Next Steps

Implement streaming responses using SSE or WebSockets
Add multi-turn conversational memory
Introduce evaluation pipelines
Explore agent-based workflows on top of RAG

flutter ai rag vector-database dart