Blog

Designing a Production-Ready RAG System with Flutter, Dart, Chroma, and dartantic_ai

Learn how to build a highly intelligent Retrieval-Augmented Generation (RAG) application in Flutter using Dart, the Gemini API, and a vector database for semantic search.

Posted on: 2026-03-27 by AI Assistant


Building a Retrieval-Augmented Generation (RAG) system is no longer just about connecting an LLM to a database. In production, it becomes a question of architecture—how components communicate, scale, and remain secure.

This guide walks through a full project structure for a modern AI application using:

The objective is to design a system that is clean, scalable, and maintainable.

System Architecture Overview

A production-ready RAG system should be layered:

[ Flutter App ]

[ Dart Backend (API + Agents) ]

[ Chroma (Vector DB) ]

[ LLM Provider (Gemini) ]

Why this matters

Project Structure (Monorepo Style)

A well-organized repository improves long-term maintainability:

rag-flutter-app/

├── apps/
│   ├── mobile_app/        # Flutter application
│   └── backend/           # Dart backend (API + RAG logic)

├── packages/
│   ├── rag_core/          # Shared RAG logic (optional)
│   └── models/            # Data models (DTOs)

├── infra/
│   ├── chroma/            # Chroma setup (Docker / scripts)
│   └── scripts/           # indexing / ingestion scripts

└── README.md

Flutter Application Structure

apps/mobile_app/
├── lib/
│   ├── features/
│   │   └── chat/
│   │       ├── chat_screen.dart
│   │       ├── chat_controller.dart
│   │       └── chat_service.dart
│   │
│   ├── core/
│   │   ├── api_client.dart
│   │   └── config.dart
│   │
│   └── main.dart

Responsibilities

Example API Call

Future<String> askQuestion(String query) async {
  final response = await http.post(
    Uri.parse('$baseUrl/ask'),
    body: jsonEncode({"query": query}),
  );

  return jsonDecode(response.body)['answer'];
}

The mobile app should not handle embeddings, vector search, or API keys.

Dart Backend (RAG Core)

apps/backend/
├── lib/
   ├── main.dart
   ├── routes/
   └── rag_routes.dart

   ├── services/
   ├── rag_service.dart
   ├── embedding_service.dart
   ├── vector_service.dart
   └── llm_service.dart

   ├── agents/
   └── rag_agent.dart

   └── config/
       └── env.dart

Core RAG Service

Future<String> askQuestion(String query) async {
  final embedding = await embeddingService.embed(query);
  final chunks = await vectorService.search(embedding, topK: 5);
  final prompt = buildPrompt(query, chunks);
  return await llmService.generate(prompt);
}

Agent Layer with dartantic_ai

The agent orchestrates the workflow:

final agent = RagAgent(
  embeddingService,
  vectorService,
  llmService,
);

final answer = await agent.run(query);

This abstraction allows extensibility for tools, memory, and more advanced workflows.

Vector Layer with Chroma

Run Chroma as a service:

docker run -p 8000:8000 chromadb/chroma

Vector Service Example

Future<List<String>> search(List<double> embedding, {int topK = 5}) async {
  final response = await http.post(
    Uri.parse('$chromaUrl/api/query'),
    body: jsonEncode({
      "query_embeddings": [embedding],
      "n_results": topK
    }),
  );

  return parseChunks(response);
}

Data Ingestion Pipeline

RAG requires preprocessing before it can function effectively:

infra/scripts/
├── ingest.dart
├── chunker.dart
└── embedder.dart

Pipeline Flow

Documents → Chunking → Embeddings → Chroma

Configuration and Secrets

GEMINI_API_KEY=your_key
CHROMA_URL=http://localhost:8000

Sensitive data must remain on the backend.

Optional Shared Package

packages/rag_core/

This package can include:

Deployment Strategy

Backend

Chroma

Flutter

Scaling Considerations

To support growth:

Common Mistakes

Conclusion

A well-designed RAG system is not a single feature but a structured pipeline.

By combining:

you can build applications that are not only AI-powered but also grounded in real data and ready for production use.

Next Steps