Blog

Your First Local LLM: A Developers Guide to Ollama and Docker

Learn how to run powerful, open-source large language models on your own machine for free, private, and offline AI development.

Posted on: 2026-03-11 by AI Assistant


The world of AI is moving at lightning speed, but relying on third-party APIs for everything isn’t always the best option. What if you want to experiment without racking up costs, ensure your data remains private, or build applications that work offline? The answer is to run a Large Language Model (LLM) locally, on your own machine.

In this guide, you’ll learn how to create your own local AI playground using Ollama and Docker. It’s surprisingly simple and opens up a new world of possibilities for developers.

The “Why”: Why Run an LLM Locally?

  1. Cost-Effective: Experimentation is free. You can run as many queries as you want without worrying about API bills.
  2. Privacy & Security: Your data never leaves your machine. This is critical when working with sensitive or proprietary code.
  3. Offline Capability: Build and run AI-powered applications that don’t need an internet connection.
  4. Customization: It’s the first step towards fine-tuning models on your own data to create highly specialized AI assistants.

Prerequisites: What You Need

The “How”: Setting Up Your Local LLM

Step 1: Start the Ollama Container

Ollama provides a convenient Docker image that packages everything you need. Open your terminal and run the following command:

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Let’s break that down:

Step 2: Pull Your First Model

With Ollama running, you can now download and run a model. We’ll start with Llama 3, a powerful and popular model from Meta.

Execute the following command to “exec” into the running container and run the model:

docker exec -it ollama ollama run llama3

This command does two things: it downloads the llama3 model (if you don’t have it already), and then it drops you into an interactive chat session.

>>> Send a message (/? for help)

You’re now talking to an AI running entirely on your machine! Ask it a question, like “write a python function to reverse a string”. Press Ctrl+D to exit.

Step 3: Interact via the API

While the command line is great for quick tests, the real power comes from Ollama’s built-in REST API. You can interact with your model programmatically.

Open a new terminal and use curl to send a request:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Why is the sky blue?"
}'

You’ll get a series of JSON responses streamed back to you as the model generates the answer.

Step 4: Use it with Python

Let’s write a simple Python script to interact with our local LLM.

import requests
import json

def generate(prompt):
    """
    Sends a prompt to the local Ollama server and streams the response.
    """
    url = "http://localhost:11434/api/generate"
    data = {
        "model": "llama3",
        "prompt": prompt,
        "stream": False # Set to False for a single, complete response
    }

    response = requests.post(url, json=data)
    response.raise_for_status() # Raise an exception for bad status codes

    # Parse the single JSON response
    response_data = response.json()
    print(response_data.get("response", "No response found."))

if __name__ == "__main__":
    user_prompt = "Write a short, professional git commit message for a change that adds a README.md file."
    generate(user_prompt)

Save this as local_llm.py and run it with python local_llm.py. You’ll see the AI’s response printed directly in your terminal.

What’s Next?

Congratulations! You now have a powerful LLM running on your local machine. This is the foundational step for building an incredible range of AI-powered developer tools.

Running your own models locally is a superpower. You have the freedom to innovate without limits. What will you build first?