Designing a Production-Ready RAG System with Flutter, Dart, Chroma, and dartantic_ai
Learn how to build a highly intelligent Retrieval-Augmented Generation (RAG) application in Flutter using Dart, the Gemini API, and a vector database for semantic search.
Posted on: 2026-03-27 by AI Assistant

Building a Retrieval-Augmented Generation (RAG) system is no longer just about connecting an LLM to a database. In production, it becomes a question of architecture—how components communicate, scale, and remain secure.
This guide walks through a full project structure for a modern AI application using:
- Flutter (frontend)
- Dart backend with
dartantic_ai(orchestration layer) - Chroma (vector storage)
- Gemini (LLM and embeddings)
The objective is to design a system that is clean, scalable, and maintainable.
System Architecture Overview
A production-ready RAG system should be layered:
[ Flutter App ]
↓
[ Dart Backend (API + Agents) ]
↓
[ Chroma (Vector DB) ]
↓
[ LLM Provider (Gemini) ]
Why this matters
- Security: API keys remain on the backend
- Performance: centralized caching and retrieval
- Flexibility: components can be replaced independently
Project Structure (Monorepo Style)
A well-organized repository improves long-term maintainability:
rag-flutter-app/
│
├── apps/
│ ├── mobile_app/ # Flutter application
│ └── backend/ # Dart backend (API + RAG logic)
│
├── packages/
│ ├── rag_core/ # Shared RAG logic (optional)
│ └── models/ # Data models (DTOs)
│
├── infra/
│ ├── chroma/ # Chroma setup (Docker / scripts)
│ └── scripts/ # indexing / ingestion scripts
│
└── README.md
Flutter Application Structure
apps/mobile_app/
├── lib/
│ ├── features/
│ │ └── chat/
│ │ ├── chat_screen.dart
│ │ ├── chat_controller.dart
│ │ └── chat_service.dart
│ │
│ ├── core/
│ │ ├── api_client.dart
│ │ └── config.dart
│ │
│ └── main.dart
Responsibilities
- Rendering user interface
- Handling user input
- Calling backend APIs
Example API Call
Future<String> askQuestion(String query) async {
final response = await http.post(
Uri.parse('$baseUrl/ask'),
body: jsonEncode({"query": query}),
);
return jsonDecode(response.body)['answer'];
}
The mobile app should not handle embeddings, vector search, or API keys.
Dart Backend (RAG Core)
apps/backend/
├── lib/
│ ├── main.dart
│ ├── routes/
│ │ └── rag_routes.dart
│ │
│ ├── services/
│ │ ├── rag_service.dart
│ │ ├── embedding_service.dart
│ │ ├── vector_service.dart
│ │ └── llm_service.dart
│ │
│ ├── agents/
│ │ └── rag_agent.dart
│ │
│ └── config/
│ └── env.dart
Core RAG Service
Future<String> askQuestion(String query) async {
final embedding = await embeddingService.embed(query);
final chunks = await vectorService.search(embedding, topK: 5);
final prompt = buildPrompt(query, chunks);
return await llmService.generate(prompt);
}
Agent Layer with dartantic_ai
The agent orchestrates the workflow:
final agent = RagAgent(
embeddingService,
vectorService,
llmService,
);
final answer = await agent.run(query);
This abstraction allows extensibility for tools, memory, and more advanced workflows.
Vector Layer with Chroma
Run Chroma as a service:
docker run -p 8000:8000 chromadb/chroma
Vector Service Example
Future<List<String>> search(List<double> embedding, {int topK = 5}) async {
final response = await http.post(
Uri.parse('$chromaUrl/api/query'),
body: jsonEncode({
"query_embeddings": [embedding],
"n_results": topK
}),
);
return parseChunks(response);
}
Data Ingestion Pipeline
RAG requires preprocessing before it can function effectively:
infra/scripts/
├── ingest.dart
├── chunker.dart
└── embedder.dart
Pipeline Flow
Documents → Chunking → Embeddings → Chroma
Recommended Strategy
- Chunk size: 300–800 tokens
- Overlap: 10–20%
Configuration and Secrets
GEMINI_API_KEY=your_key
CHROMA_URL=http://localhost:8000
Sensitive data must remain on the backend.
Optional Shared Package
packages/rag_core/
This package can include:
- Prompt templates
- Shared utilities
- Common logic
Deployment Strategy
Backend
- Containerize with Docker
- Deploy to platforms such as Cloud Run, Fly.io, or AWS ECS
Chroma
- Run as a container
- Attach persistent storage
Flutter
- Build for iOS and Android
Scaling Considerations
To support growth:
- Introduce caching (embeddings and responses)
- Improve retrieval with hybrid search
- Add re-ranking mechanisms
- Monitor accuracy and system performance
Common Mistakes
- Placing vector database logic in Flutter
- Skipping document chunking
- Passing excessive context to the LLM
- Exposing API keys
- Treating RAG as a single function rather than a pipeline
Conclusion
A well-designed RAG system is not a single feature but a structured pipeline.
By combining:
- Flutter for user experience
- Dart with
dartantic_aifor orchestration - Chroma for retrieval
- Gemini for generation
you can build applications that are not only AI-powered but also grounded in real data and ready for production use.
Next Steps
- Implement streaming responses using SSE or WebSockets
- Add multi-turn conversational memory
- Introduce evaluation pipelines
- Explore agent-based workflows on top of RAG