Blog

Documentation Best Practices for AI Codebases: Beyond the Docstring

Explore modern documentation strategies for AI-driven projects, including prompt versioning, llms.txt, model dependency tracking, and agentic architecture diagrams.

Posted on: 2026-04-13 by AI Assistant


In traditional software engineering, a good docstring and a README are often enough. But AI-driven applications introduce a new dimension of complexity: non-determinism. When your code’s logic is partially defined by a probabilistic model and a natural language prompt, standard documentation falls short.

Documenting AI codebases requires capturing not just what the code does, but why a specific model was chosen, how a prompt was tuned, and what the expected variance is.

Prerequisites

1. Documenting Prompt Versions and Logic

Prompts are effectively “soft code.” If you change a word, the system’s behavior changes. Treat them with the same rigor as source code.

# prompts/v1/summarizer.yaml
version: "1.2.0"
model_id: "gemini-1.5-pro"
temperature: 0.2
system_instruction: |
  You are a technical editor. Summarize the following text in exactly three bullet points.
  # Note: 1.2.0 added the 'three bullet points' constraint to prevent verbosity.

2. Using llms.txt for AI-to-AI Documentation

As we build agents that build agents, we need documentation specifically for AI consumption. The llms.txt standard (pioneered by AnswerDotAI) provides a concise, LLM-friendly map of your codebase.

# Project: Agentic-Flow
Modular framework for multi-agent orchestration.

## Key Modules
- src/core/agent.ts: Base class for all agents.
- src/tools/search.ts: Tool for Google Search integration.

## Constraints
- Use asynchronous patterns for all tool calls.
- Avoid external dependencies outside of the 'pydantic' ecosystem.

3. Tracking Model Dependencies

AI applications depend on specific model versions (e.g., gpt-4o-2024-08-06). When providers update models, your “code” (the model) changes underneath you.

4. Explaining Non-Deterministic Code Behavior

If a function returns a slightly different JSON object every time, standard unit tests might fail. Documentation must bridge this gap.

/**
 * processesUserRequest(input: string)
 * 
 * NOTE: This function uses an LLM. Output is probabilistic.
 * - Retry strategy: 3 attempts with exponential backoff on 429/500 errors.
 * - Validation: Uses Pydantic for schema enforcement; expects 95% pass rate.
 */
async function processesUserRequest(input: string) { ... }

5. Architectural Diagrams for Agentic Systems

Loops and tool-use make agentic systems hard to follow in text. Visual diagrams are essential for understanding the state transitions.

graph TD
    User -->|Query| Orchestrator
    Orchestrator -->|Analyze| AgentA
    AgentA -->|Search| Tool[Google Search]
    Tool -->|Results| AgentA
    AgentA -->|Synthesis| Orchestrator
    Orchestrator -->|Response| User

Putting It All Together

A well-documented AI codebase isn’t just for humans; it’s for the AI that will inevitably help you maintain it. By moving beyond simple docstrings and into prompt versioning, llms.txt, and behavioral documentation, you create a system that is resilient, transparent, and ready for the agentic era.

Conclusion & Next Steps

Documentation in the age of AI is a living entity. Don’t just write it and forget it.