Your First Local LLM: A Developers Guide to Ollama and Docker

Learn how to run powerful, open-source large language models on your own machine for free, private, and offline AI development.

Published on • 2026-03-11

AI Assistant

The world of AI is moving at lightning speed, but relying on third-party APIs for everything isn’t always the best option. What if you want to experiment without racking up costs, ensure your data remains private, or build applications that work offline? The answer is to run a Large Language Model (LLM) locally, on your own machine.

In this guide, you’ll learn how to create your own local AI playground using Ollama and Docker. It’s surprisingly simple and opens up a new world of possibilities for developers.

Why Run an LLM Locally?

Cost-Effective: Experimentation is free. You can run as many queries as you want without worrying about API bills.
Privacy & Security: Your data never leaves your machine. This is critical when working with sensitive or proprietary code.
Offline Capability: Build and run AI-powered applications that don’t need an internet connection.
Customization: It’s the first step towards fine-tuning models on your own data to create highly specialized AI assistants.

Prerequisites

Docker Desktop: If you don’t have it, download and install it from the official Docker website.
A Modern Computer: While a powerful GPU with lots of VRAM is beneficial for running larger models, you can get started with many powerful smaller models on a modern laptop with a good CPU and at least 16GB of RAM.
Command Line Familiarity: You should be comfortable opening a terminal and running basic commands.

Setting Up Your Local LLM

Step 1: Start the Ollama Container

Ollama provides a convenient Docker image that packages everything you need. Open your terminal and run the following command:

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Let’s break that down:

docker run -d: Runs the container in detached mode (in the background).
--gpus=all: Crucial for performance. This gives the container access to your NVIDIA GPU. If you don’t have one, you can remove this flag, and it will run on your CPU.
-v ollama:/root/.ollama: Creates a Docker volume named ollama to store the downloaded models, so they persist even if you remove the container.
-p 11434:11434: Maps port 11434 on your host machine to the container’s port, allowing you to communicate with Ollama.
--name ollama: Gives the container a memorable name.
ollama/ollama: The official Ollama image to use.

Step 2: Pull Your First Model

With Ollama running, you can now download and run a model. We’ll start with Llama 3, a powerful and popular model from Meta.

Execute the following command to “exec” into the running container and run the model:

docker exec -it ollama ollama run llama3

This command does two things: it downloads the llama3 model (if you don’t have it already), and then it drops you into an interactive chat session.

>>> Send a message (/? for help)

You’re now talking to an AI running entirely on your machine! Ask it a question, like “write a python function to reverse a string”. Press Ctrl+D to exit.

Step 3: Interact via the API

While the command line is great for quick tests, the real power comes from Ollama’s built-in REST API. You can interact with your model programmatically.

Open a new terminal and use curl to send a request:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Why is the sky blue?"
}'

You’ll get a series of JSON responses streamed back to you as the model generates the answer.

Step 4: Use it with Python

Let’s write a simple Python script to interact with our local LLM.

import requests
import json

def generate(prompt):
    """
    Sends a prompt to the local Ollama server and streams the response.
    """
    url = "http://localhost:11434/api/generate"
    data = {
        "model": "llama3",
        "prompt": prompt,
        "stream": False # Set to False for a single, complete response
    }

    response = requests.post(url, json=data)
    response.raise_for_status() # Raise an exception for bad status codes

    # Parse the single JSON response
    response_data = response.json()
    print(response_data.get("response", "No response found."))

if __name__ == "__main__":
    user_prompt = "Write a short, professional git commit message for a change that adds a README.md file."
    generate(user_prompt)

Save this as local_llm.py and run it with python local_llm.py. You’ll see the AI’s response printed directly in your terminal.

What’s Next?

Congratulations! You now have a powerful LLM running on your local machine. This is the foundational step for building an incredible range of AI-powered developer tools.

Experiment: Try out other models from the Ollama library, like codegemma for coding or phi3 for a smaller, faster alternative.
Build Something: Use this local API to build a more advanced tool, like the “AI-Powered Code Documentation Bot” we’ll cover in a future post.
Explore RAG: Learn how you can combine this local model with your own documentation to build a “Chat with Your Docs” application.

Running your own models locally is a superpower. You have the freedom to innovate without limits. What will you build first?

ai llm docker ollama development python