Defensive AI Programming: Building Robust and Error-Resilient LLM Applications

Learn how to build production-ready LLM applications with defensive programming patterns, validation, and graceful degradation.

Published on • 2026-04-13

AI Assistant

In the early days of LLM development, we were all amazed by simple prototypes. A few lines of code and a prompt could generate poetry or summarize text. But as these applications move into production, the “happy path” starts to crumble. LLMs are non-deterministic, they hallucinate, they experience latency spikes, and they hit rate limits.

Moving from a “cool demo” to a production-grade service requires Defensive AI Programming. This means assuming the AI will fail and building a fortress of safety around it.

Prerequisites

Python 3.9+
Basic familiarity with OpenAI or Anthropic APIs
Understanding of JSON and Pydantic

The “How”: Step-by-Step Robustness

1. Structured Output Validation (The Pydantic Shield)

Never trust the raw string output of an LLM. Use Pydantic to enforce a schema. If the LLM returns invalid JSON or missing fields, your validation layer should catch it before it crashes your downstream logic.

from pydantic import BaseModel, Field, ValidationError
from typing import List

class SearchResult(BaseModel):
    title: str = Field(..., description="The title of the found page")
    relevance_score: float = Field(..., ge=0, le=1)
    keywords: List[str]

# Imagine this comes from your LLM
raw_ai_output = '{"title": "Defensive Coding", "relevance_score": 0.95, "keywords": ["ai", "safety"]}'

try:
    validated_data = SearchResult.model_validate_json(raw_ai_output)
    print(f"Validated: {validated_data.title}")
except ValidationError as e:
    print(f"Validation Error: {e.json()}")
    # Trigger a retry or a fallback

Visual Tip: A diagram showing “Raw LLM Output” flowing into a “Pydantic Validator” funnel, resulting in either “Clean Object” or “Error/Retry”.

2. Smart Retries with Exponential Backoff

LLM APIs often fail due to transient network issues or temporary server overload. Instead of a simple loop, use the tenacity library to handle retries with jitter and exponential backoff.

from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
import openai

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=10),
    retry=retry_if_exception_type(openai.RateLimitError)
)
def get_llm_response(prompt: str):
    return openai.ChatCompletion.create(model="gpt-4", messages=[{"role": "user", "content": prompt}])

3. Handling Rate Limits and Tokens

Rate limits are a fact of life in AI. Use a token-bucket algorithm or a simple semaphore to limit concurrent requests. Also, always count your tokens before sending them to avoid “context overflow” errors.

import tiktoken

def count_tokens(text: str, model="gpt-4"):
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

# Check before you send
if count_tokens(long_prompt) > 8000:
    # Use a summarization fallback or truncate
    ...

4. Monitoring, Logging, and Tracing

You can’t fix what you can’t see. Use structured logging to capture not just errors, but the specific prompts and responses that led to them. Tools like LangSmith, Helicone, or simple OpenTelemetry can save you hours of debugging.

Visual Tip: A screenshot of a tracing dashboard showing a “LLM Call” span with its associated metadata (tokens used, latency, and Pydantic validation status).

5. Graceful Degradation (The Plan B)

When the AI fails (even after retries), don’t show a generic 500 error. Have a fallback:

Rule-based fallback: Use regex or simple keyword matching.
Cheaper/Faster model fallback: Switch from GPT-4 to GPT-3.5 or a local Llama model.
Human-in-the-loop: Flag the entry for manual review.

Putting It All Together

Here is a conceptual “Robust AI Wrapper”:

class RobustAIAgent:
    def __init__(self, primary_model="gpt-4", fallback_model="gpt-3.5-turbo"):
        self.primary = primary_model
        self.fallback = fallback_model

    async def generate_summary(self, text: str):
        try:
            # 1. Attempt with primary model + retries
            return await self._call_ai(self.primary, text)
        except Exception as primary_error:
            log.warning(f"Primary model failed: {primary_error}")
            
            # 2. Graceful degradation to secondary model
            try:
                return await self._call_ai(self.fallback, text)
            except Exception as secondary_error:
                # 3. Final fallback: Static summary or error notification
                return "Summary currently unavailable. Please check back later."

    async def _call_ai(self, model, text):
        # Implementation with Pydantic validation and tenacity
        ...

Conclusion & Next Steps

Defensive AI programming isn’t about building a perfect prompt; it’s about building a perfect system around the prompt. By validating outputs, managing rate limits, and planning for failure, you transform a brittle AI experiment into a reliable software asset.

Next Steps:

Integrate Pydantic into your current LLM pipeline.
Add Tenacity retries to your API calls.
Set up Tracing to monitor your production performance.

Build like it’s going to fail—because eventually, it will.

ai llm programming reliability