Blog

Serverless AI is Here: Deploying Language Models with AWS Lambda

Move your AI application from a script to a scalable, production-ready serverless API using AWS Lambda, API Gateway, and Docker.

Posted on: 2026-03-11 by AI Assistant


You’ve built a cool AI-powered application that runs perfectly on your local machine. Now what? To share it with the world, you need to deploy it. While a traditional server works, a serverless approach using AWS Lambda offers incredible scalability, cost-efficiency, and ease of management.

In this tutorial, you’ll learn how to take a simple Python AI application, package it with Docker, and deploy it as a serverless API that can handle production traffic.

The “Why”: Why Serverless for AI?

The main challenge used to be package size and cold starts, but with modern features like container image support and function URLs, Lambda is now a perfect fit for many AI workloads.

Prerequisites: What You Need

The “How”: Deploying a Serverless AI Function

We’ll deploy a simple application that uses the google-generativeai library to interact with the Gemini API.

Step 1: Create the Python Application

First, create a folder for our project and add a requirements.txt file.

# requirements.txt
google-genai

Next, create the application file, app.py. Lambda expects a handler function that takes event and context as arguments.

# app.py
# app.py
import os
import json
from google import genai

# Configure the API from an environment variable
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
if not GEMINI_API_KEY:
    raise ValueError("GEMINI_API_KEY environment variable not set.")

client = genai.Client(api_key=GEMINI_API_KEY)

def handler(event, context):
    """
    This is the main Lambda handler function.
    """
    print(f"Received event: {event}")
    
    # API Gateway wraps the request body in a string
    body = json.loads(event.get("body", "{}"))
    prompt = body.get("prompt")

    if not prompt:
        return {
            "statusCode": 400,
            "body": json.dumps({"error": "Prompt not found in request body."})
        }
    
    try:
        response = client.models.generate_content(
            model="gemini-2.5-flash",
            contents=prompt
        )
        return {
            "statusCode": 200,
            "headers": {"Content-Type": "application/json"},
            "body": json.dumps({"response": response.text})
        }
    except Exception as e:
        print(f"Error during generation: {e}")
        return {
            "statusCode": 500,
            "body": json.dumps({"error": str(e)})
        }

Step 2: Dockerize the Application

AWS Lambda can run container images directly. This is perfect for AI applications which often have large dependencies.

Create a Dockerfile:

# Use the official AWS Lambda Python base image
FROM public.ecr.aws/lambda/python:3.11

# Copy the requirements file
COPY requirements.txt ./

# Install the dependencies
RUN pip install -r requirements.txt

# Copy the application code
COPY app.py ./

# Set the CMD to your handler function
# Format: <filename>.<handler_function_name>
CMD [ "app.handler" ]

Step 3: Build and Push the Image to ECR

Elastic Container Registry (ECR) is AWS’s private Docker registry.

  1. Create an ECR Repository: Go to the ECR service in the AWS Console and create a new private repository (e.g., my-serverless-ai-app).
  2. Log in to ECR: Get the login command from the “View push commands” button in your new repository and run it in your terminal.
  3. Build the Image: docker build -t my-serverless-ai-app .
  4. Tag the Image: docker tag my-serverless-ai-app:latest YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/my-serverless-ai-app:latest
  5. Push the Image: docker push YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/my-serverless-ai-app:latest

Step 4: Create the Lambda Function

  1. Go to the Lambda service in the AWS Console and click “Create function”.
  2. Select the “Container image” option.
  3. Give your function a name.
  4. For the “Container image URI”, browse for the ECR image you just pushed.
  5. Under “Configuration” > “Environment variables”, add your GEMINI_API_KEY.
  6. Click “Create function”.

Step 5: Add an API Gateway Trigger

  1. In your new Lambda function’s dashboard, click “Add trigger”.
  2. Select “API Gateway”.
  3. Choose “Create a new API” with the “HTTP API” type.
  4. Leave the security as “Open” for now for easy testing.
  5. Click “Add”.

You will now have a public API URL. You can use curl or any API client to test it:

curl -X POST YOUR_API_GATEWAY_URL 
-H "Content-Type: application/json" 
-d '{"prompt": "Tell me a fun fact about serverless computing."}'

What’s Next?

You have successfully deployed a scalable, production-ready serverless AI application!