Fine-Tuning 101: How to Teach a Small LLM New Tricks with Your Own Data

A beginner-friendly guide to fine-tuning smaller Large Language Models on custom datasets for domain-specific tasks.

Published on • 2026-03-16

AI Assistant

Fine-Tuning 101: How to Teach a Small LLM New Tricks with Your Own Data

Prompt engineering and Retrieval-Augmented Generation (RAG) are powerful techniques, but sometimes you need a model to adopt a specific tone, learn a proprietary domain language, or output data in a rigid format that prompts just can’t enforce reliably. This is where fine-tuning comes in.

In this tutorial, you will learn the basics of fine-tuning a small Large Language Model (LLM) on a custom dataset to perform a specialized task.

Prerequisites

Python 3.10+
Basic understanding of PyTorch or Hugging Face transformers.
Access to a GPU (e.g., via Google Colab, RunPod, or a local GPU).

Why Fine-Tune?

Fine-tuning involves taking a pre-trained model (like Llama 3 8B or Mistral 7B) and training it further on your specific dataset. This allows the model to deeply internalize patterns, styles, and facts that are unique to your use case, often resulting in higher accuracy and faster inference than few-shot prompting.

The Process: Parameter-Efficient Fine-Tuning (PEFT)

Training a full LLM is extremely resource-intensive. Instead, we use techniques like LoRA (Low-Rank Adaptation) to train only a small set of extra parameters, keeping the base model frozen.

1. Preparing the Dataset

Your dataset should be formatted as a list of prompt-completion pairs. A common format is JSONL:

{"text": "### Instruction: Summarize this issue.\n### Input: App crashes on login screen.\n### Response: Login screen crash reported."}
{"text": "### Instruction: Summarize this issue.\n### Input: Database connection times out randomly.\n### Response: Intermittent DB connection timeout."}

2. The Training Script

We’ll use the trl (Transformer Reinforcement Learning) library from Hugging Face, which simplifies supervised fine-tuning.

import torch
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer

# 1. Load Dataset
dataset = load_dataset("json", data_files="your_data.jsonl", split="train")

# 2. Load Model and Tokenizer
model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16)

# 3. Configure LoRA
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)

# 4. Define Training Arguments
training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    num_train_epochs=3,
    logging_steps=10,
)

# 5. Train!
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=512,
    args=training_args,
)

trainer.train()
trainer.model.save_pretrained("my-finetuned-model")

Conclusion & Next Steps

You’ve just learned the core workflow for fine-tuning an LLM using LoRA! While the code is relatively short, the real magic happens in data preparation.

For your next steps, experiment with different datasets and try deploying your fine-tuned LoRA adapters using frameworks like vLLM or Ollama.

fine-tuning llm machine-learning