Serverless AI is Here: Deploying Language Models with AWS Lambda

Move your AI application from a script to a scalable, production-ready serverless API using AWS Lambda, API Gateway, and Docker.

Published on • 2026-03-11

AI Assistant

You’ve built a cool AI-powered application that runs perfectly on your local machine. Now what? To share it with the world, you need to deploy it. While a traditional server works, a serverless approach using AWS Lambda offers incredible scalability, cost-efficiency, and ease of management.

In this tutorial, you’ll learn how to take a simple Python AI application, package it with Docker, and deploy it as a serverless API that can handle production traffic.

Why Serverless for AI?

Pay-per-use: You only pay when your function is actually running. For applications with variable traffic, this is far cheaper than an always-on server.
Infinite Scalability: AWS Lambda automatically handles scaling. If a million users hit your API at once, Lambda will spin up enough instances to handle the load.
Simplified Management: You don’t have to worry about managing servers, patching operating systems, or configuring network infrastructure.

The main challenge used to be package size and cold starts, but with modern features like container image support and function URLs, Lambda is now a perfect fit for many AI workloads.

Prerequisites

An AWS Account: You’ll need access to the AWS console.
Docker Desktop: For building and testing our container image.
AWS CLI: Installed and configured with your AWS credentials.
Basic knowledge of Python and Docker.

Deploying a Serverless AI Function

We’ll deploy a simple application that uses the google-genai library to interact with the Gemini API.

Step 1: Create the Python Application

First, create a folder for our project and add a requirements.txt file.

# requirements.txt
google-genai

Next, create the application file, app.py. Lambda expects a handler function that takes event and context as arguments.

# app.py
# app.py
import os
import json
from google import genai

# Configure the API from an environment variable
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
if not GEMINI_API_KEY:
    raise ValueError("GEMINI_API_KEY environment variable not set.")

client = genai.Client(api_key=GEMINI_API_KEY)

def handler(event, context):
    """
    This is the main Lambda handler function.
    """
    print(f"Received event: {event}")
    
    # API Gateway wraps the request body in a string
    body = json.loads(event.get("body", "{}"))
    prompt = body.get("prompt")

    if not prompt:
        return {
            "statusCode": 400,
            "body": json.dumps({"error": "Prompt not found in request body."})
        }
    
    try:
        response = client.models.generate_content(
            model="gemini-2.5-flash",
            contents=prompt
        )
        return {
            "statusCode": 200,
            "headers": {"Content-Type": "application/json"},
            "body": json.dumps({"response": response.text})
        }
    except Exception as e:
        print(f"Error during generation: {e}")
        return {
            "statusCode": 500,
            "body": json.dumps({"error": str(e)})
        }

Step 2: Dockerize the Application

AWS Lambda can run container images directly. This is perfect for AI applications which often have large dependencies.

Create a Dockerfile:

# Use the official AWS Lambda Python base image
FROM public.ecr.aws/lambda/python:3.11

# Copy the requirements file
COPY requirements.txt ./

# Install the dependencies
RUN pip install -r requirements.txt

# Copy the application code
COPY app.py ./

# Set the CMD to your handler function
# Format: <filename>.<handler_function_name>
CMD [ "app.handler" ]

Step 3: Build and Push the Image to ECR

Elastic Container Registry (ECR) is AWS’s private Docker registry.

Create an ECR Repository: Go to the ECR service in the AWS Console and create a new private repository (e.g., my-serverless-ai-app).
Log in to ECR: Get the login command from the “View push commands” button in your new repository and run it in your terminal.
Build the Image: docker build -t my-serverless-ai-app .
Tag the Image: docker tag my-serverless-ai-app:latest YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/my-serverless-ai-app:latest
Push the Image: docker push YOUR_AWS_ACCOUNT_ID.dkr.ecr.YOUR_REGION.amazonaws.com/my-serverless-ai-app:latest

Step 4: Create the Lambda Function

Go to the Lambda service in the AWS Console and click “Create function”.
Select the “Container image” option.
Give your function a name.
For the “Container image URI”, browse for the ECR image you just pushed.
Under “Configuration” > “Environment variables”, add your GEMINI_API_KEY.
Click “Create function”.

Step 5: Add an API Gateway Trigger

In your new Lambda function’s dashboard, click “Add trigger”.
Select “API Gateway”.
Choose “Create a new API” with the “HTTP API” type.
Leave the security as “Open” for now for easy testing.
Click “Add”.

You will now have a public API URL. You can use curl or any API client to test it:

curl -X POST YOUR_API_GATEWAY_URL 
-H "Content-Type: application/json" 
-d '{"prompt": "Tell me a fun fact about serverless computing."}'

What’s Next?

You have successfully deployed a scalable, production-ready serverless AI application!

Add Authentication: Secure your API Gateway endpoint using AWS IAM or API keys.
Monitor Your Function: Use Amazon CloudWatch to monitor logs, invocations, and errors.
Optimize Performance: For demanding workloads, explore Provisioned Concurrency to keep your function “warm” and reduce cold starts.

ai serverless aws-lambda docker mlops python