Cloud 7: Building an API with Lambda, Docker and CloudFormation

Managing Python dependencies with lambda functions in Amazon Web Services (AWS)
I recently used a serverless approach to build an API for a project. Each API call triggered a lambda function to process the call and then returned a response. This had the advantage of not having to maintain a constantly running server costing money. However, the Python dependencies involved, in this case pandas, numpy (and associated C compiler), were not completely straightforward to include in the AWS Lambda serverless framework I was using.

One way to deal with this is by installing the dependencies in a virtual environment, zipping it and uploading it to S3 for the lambda to access (This is described in the first reference below). However, there are size limits on what can be packaged in this approach at one go. I therefore ended up using a Docker image to install the dependencies. The image is then run by a lambda launched by an API call. For future reference I am writing this up here with a simple example of installing dependencies for this kind of use case.

The following assumes some knowledge of Docker and AWS, and that the AWS command line interface (CLI) is set up with the permissions to launch the various services.

A schematic of the components used is given below: api_schematic The project setup is that the Dockerfile that builds the container image and the Python script (lambda_function.py) we want the API to run via the lambda are included in a folder called container_folder with the requirements.txt file listing the dependencies. The CloudFormation yaml file api-lambda-example.yml is used to specify and launch the components of the API. The files are available from https://github.com/johnardavies/lambda_api_example.

Structure of the post:

1. Creating the Docker image with the dependencies
2. The script the API runs
3. The CloudFormation template that creates the API Stack
4. Generating an API key and API rate limits
5. Launching the API
6. Calling the API

1. Creating the Docker image with the dependencies

The first stage is to build a Docker image with the Python script that it runs and its dependencies. The Dockerfile below uses a base Python image. The script it runs (lambda_function.py described in the next section) is copied to a folder called function on the image. The requirements.txt file is copied into the image and used to install the dependencies. The Dockerfile then copies the folder to a smaller Python image. In this image it installs a C and C++ compiler. As we are not using an AWS image we have to install the awslambdaric package to allow AWS Lambda to run the image. The API call triggers the lambda which runs the container image. The container then runs the Python script. The general form of the Dockerfile is described in more detail in the second reference of this post.

# Define custom function directory
ARG FUNCTION_DIR="/function"

FROM python:3.10 AS build-image

# Include global arg in this stage of the build
ARG FUNCTION_DIR

# Copy function code
RUN mkdir -p ${FUNCTION_DIR}
COPY lambda_function.py ${FUNCTION_DIR}

# Copy the requirements.txt
COPY requirements.txt .

# Install the function's dependencies 
RUN pip install \
    --target ${FUNCTION_DIR} \
        -r requirements.txt

# Use a slim version of the base Python image to reduce the final image size
FROM python:3.10-slim

# Include global arg in this stage of the build
ARG FUNCTION_DIR
# Set working directory to function root directory
WORKDIR ${FUNCTION_DIR}

# Copy in the built dependencies
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}

# Install a C AND C++ compiler
RUN apt-get update && \
    apt-get install -y zip gcc g++ && \
    rm -rf /var/lib/apt/lists/*


# Set runtime interface client as default command for the container runtime
ENTRYPOINT [ "/usr/local/bin/python", "-m", "awslambdaric" ]
# Pass the name of the function handler as an argument to the runtime
CMD [ "lambda_function.handler" ]

To build the image specified by the Dockerfile run the following command outside the container_folder. Here we tag the image as api_stack.

docker build --tag api_stack container_folder

To obtain the id of the resulting Docker image we then run:

docker images

Having built the container image we then authenticate with the AWS Elastic Container Registry (ECR) to store it. To do this we run the following which pipes the ECR password to Docker to login to the container store associated with the account where xxxxxx is the AWS Account ID. We also specify a region (here eu-west-2) which also appears in the container registry file path.

aws ecr get-login-password --region eu-west-2 | docker login --username AWS --password-stdin xxxxxx.dkr.ecr.eu-west-2.amazonaws.com

If this is successful, then Login Succeeded will be returned. We then tag the container to give it a name. Here the image is identified by the id 7d990bc03f26 and given the name api_example_container.

docker tag 7d990bc03f26 xxxxxxx.dkr.ecr.eu-west-2.amazonaws.com/api_example_container

We then create a repository in ECR to hold the container image:

aws ecr create-repository \
   --repository-name api_example_container \
   --image-scanning-configuration scanOnPush=true \
   --region eu-west-2

and then push the image to the container registry so that CloudFormation can access it:

docker push xxxxxxx.dkr.ecr.eu-west-2.amazonaws.com/api_example_container:latest

2. The script the API runs

In this example the script we want the lambda to run has a very simple form which takes an input text string from the API call. The script then imports the pandas and numpy packages and prints the text input with the pandas and numpy versions in a json format. The function is called handler and is saved in the Python script lambda_function.py shown below which is copied into the image when it is built.

import sys
import numpy as np
import pandas as pd
import json


def handler(event, context):
    """Function that takes input string and imports numpy and pandas returning their versions as a json"""
    query_params = event.get("queryStringParameters", {})
    input = query_params.get("input", "default")

    # Return the result as JSON
    return {
        "statusCode": 200,
        "body": json.dumps(
            {
                "hello": str(input),
                "numpy version": str(np.__version__),
                "pandas version": str(pd.__version__),
            }
        ),
    }

3. The CloudFormation template that creates the API Stack

The API is generated by the CloudFormation template api-lambda-example.yml which specifies the services that are used and the relationship between them. We pass two parameters to the template when launching the stack to generate the API. The first ECRRepositoryUri is the ECR Uniform Resource Identifier (URI) of the container image that we want the lambda to run. The second is the StageName which forms part of the endpoint and is typically used to distinguish between different versions of the API, for example a production or development version (prod or dev).

Description: A CloudFormation template that creates an API that runs a container. The API is rate limited API with an API key. 

Parameters:
  ECRRepositoryUri:
    Type: String
    Description: The URI of the Docker image in Amazon ECR (e.g., <account-id>.dkr.ecr.<region>.amazonaws.com/<repository-name>:<tag>).

  StageName:
    Type: String
    Description: Name of API stage.

Resources:
  
  # API Gateway
  ApiGateway:
    Type: 'AWS::ApiGateway::RestApi'
    Properties:
      Name: 'ApiExample'
      ApiKeySourceType: HEADER

  # API Resource
  ApiResource:
    Type: 'AWS::ApiGateway::Resource'
    Properties:
      ParentId: !GetAtt ApiGateway.RootResourceId
      PathPart: 'installer'
      RestApiId: !Ref ApiGateway

  # Lambda Execution Role
  LambdaExecutionRole:   
    Type: 'AWS::IAM::Role'
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: 'Allow'
            Principal:
              Service: 'lambda.amazonaws.com'
            Action: 'sts:AssumeRole'
      
      Policies:
        - PolicyName: 'LambdaLogWrite'
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: 'Allow'
                Action:
                - 'logs:CreateLogGroup'
                Resource: !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*"
              - Effect: 'Allow'
                Action:  
                  - 'logs:CreateLogStream'
                  - 'logs:PutLogEvents'
                Resource: !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*:log-stream:*"

  # The Lambda function which is linked to the container image in the Elastic Container Registry
  ExampleAPIDependencies: 
    Type: "AWS::Lambda::Function"
    Properties:
      FunctionName: 'ExampleAPIDependencies'
      Role: !GetAtt LambdaExecutionRole.Arn
      Code:
        ImageUri: !Ref ECRRepositoryUri
      PackageType: "Image"
      Architectures: 
        - 'arm64'
      Timeout: 30
      MemorySize: 500

  # Gives the API Gateway permission to call the lambda function ExampleAPIDependencies
  LambdaApiGatewayPermission:
    Type: 'AWS::Lambda::Permission'
    Properties:
      Action: 'lambda:InvokeFunction'
      FunctionName: !Ref ExampleAPIDependencies
      Principal: 'apigateway.amazonaws.com'
             
  # API Method
  ApiMethod:
    Type: 'AWS::ApiGateway::Method'
    Properties:
      HttpMethod: 'GET'
      ResourceId: !Ref ApiResource
      RestApiId: !Ref ApiGateway
      AuthorizationType: 'NONE'
      ApiKeyRequired: true
      Integration:
        Type: AWS_PROXY
        IntegrationHttpMethod: 'POST'
        Uri: !Sub 'arn:aws:apigateway:eu-west-2:lambda:path/2015-03-31/functions/${ExampleAPIDependencies.Arn}/invocations'
  
  # API Deployment linked to a stage
  ApiDeployment:
    Type: 'AWS::ApiGateway::Deployment'
    DependsOn: ApiMethod  # Ensures the deployment happens after the method is created
    Properties:
      RestApiId: !Ref ApiGateway
      StageName: !Sub '${StageName}'

  # Usage Plan
  UsagePlan:
    Type: 'AWS::ApiGateway::UsagePlan'
    DependsOn: ApiDeployment  # Ensures the usage plan is created after the deployment
    Properties:
      UsagePlanName: 'MyUsagePlan'
      ApiStages:
        - ApiId: !Ref ApiGateway
          Stage: !Sub '${StageName}'
      Throttle:
        BurstLimit: 100
        RateLimit: 50
      Quota:
        Limit: 1000
        Period: MONTH

  # API Key
  ApiKey:
    Type: 'AWS::ApiGateway::ApiKey'
    Properties:
      Enabled: true
      Name: 'APIExampleKey'

  # Usage Plan Key
  UsagePlanKey:
    Type: 'AWS::ApiGateway::UsagePlanKey'
    Properties:
      KeyId: !Ref ApiKey
      KeyType: 'API_KEY'
      UsagePlanId: !Ref UsagePlan

We now go through the Resources part of the yaml in turn. In the first Resources stage an API called ApiExample is specified. The API will have its API key passed to it in the header. The template then creates a resource within the API which will be accessed via the path installer in the endpoint.

  # API Gateway
  ApiGateway:
    Type: 'AWS::ApiGateway::RestApi'
    Properties:
      Name: 'ApiExample'
      ApiKeySourceType: HEADER

  # API Resource
  ApiResource:
    Type: 'AWS::ApiGateway::Resource'
    Properties:
      ParentId: !GetAtt ApiGateway.RootResourceId
      PathPart: 'installer'
      RestApiId: !Ref ApiGateway

The lambda is then given permission, via an IAM role LambdaExecutionRole, to write to CloudWatch logs to help monitor what happens when it is called. The lambda is allowed to create a log group to store the logs. Within a log group it can write log streams. A log stream is a series of log events that have the same source e.g. the same IP address repeatedly calling the API (and triggering the lambda) in a single session generates a log stream consisting of the logs events produced by the IP calls in the session.

  # Lambda Execution Role
  LambdaExecutionRole:   
    Type: 'AWS::IAM::Role'
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: 'Allow'
            Principal:
              Service: 'lambda.amazonaws.com'
            Action: 'sts:AssumeRole'
      
      Policies:
        - PolicyName: 'LambdaLogWrite'
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: 'Allow'
                Action:
                - 'logs:CreateLogGroup'
                Resource: !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*"
              - Effect: 'Allow'
                Action:  
                  - 'logs:CreateLogStream'
                  - 'logs:PutLogEvents'
                Resource: !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*:log-stream:*"

The section below creates the lambda function ExampleAPIDependencies. The function is given the IAM role specified above in LambdaExecutionRole that allows it to write the logs. It runs the container image located in the image URI passed to the CloudFormation stack as an input. As the container was built on an Apple Silicon Mac the architecture is specified as arm64. The lambda times out after 30 seconds and has a maximum memory of 500 Megabytes. The API gateway is then given permission to invoke the lambda function.

 # The Lambda function which is linked to the container image in the Elastic Container Registry
  ExampleAPIDependencies: 
    Type: "AWS::Lambda::Function"
    Properties:
      FunctionName: 'ExampleAPIDependencies'
      Role: !GetAtt LambdaExecutionRole.Arn
      Code:
        ImageUri: !Ref ECRRepositoryUri
      PackageType: "Image"
      Architectures: 
        - 'arm64'
      Timeout: 30
      MemorySize: 500

  # Gives the API Gateway permission to call the lambda function ExampleAPIDependencies
  LambdaApiGatewayPermission:
    Type: 'AWS::Lambda::Permission'
    Properties:
      Action: 'lambda:InvokeFunction'
      FunctionName: !Ref ExampleAPIDependencies
      Principal: 'apigateway.amazonaws.com'
             
  # API Method
  ApiMethod:
    Type: 'AWS::ApiGateway::Method'
    Properties:
      HttpMethod: 'GET'
      ResourceId: !Ref ApiResource
      RestApiId: !Ref ApiGateway
      AuthorizationType: 'NONE'
      ApiKeyRequired: true
      Integration:
        Type: AWS_PROXY
        IntegrationHttpMethod: 'POST'
        Uri: !Sub 'arn:aws:apigateway:eu-west-2:lambda:path/2015-03-31/functions/${ExampleAPIDependencies.Arn}/invocations'
  
  # API Deployment linked to a stage
  ApiDeployment:
    Type: 'AWS::ApiGateway::Deployment'
    DependsOn: ApiMethod  # Ensures the deployment happens after the method is created
    Properties:
      RestApiId: !Ref ApiGateway
      StageName: !Sub '${StageName}'

The ApiMethod above specifies a GET command to call the API and then a POST command which passes the data contained in the API call to the lambda function for processing. It also ensures that an API key is required. ApiDeployment specifies that the API is deployed after the previous ApiMethod is created.

4. Generating an API key and API rate limits

To make it harder for someone unwanted to hit the API endpoint a large number of times racking up a bill, the CloudFormation template includes a rate limit plan. In the template an API key is generated here called ‘APIExampleKey’ which is then attached to the plan.

  # Usage Plan
  UsagePlan:
    Type: 'AWS::ApiGateway::UsagePlan'
    DependsOn: ApiDeployment  # Ensures the usage plan is created after the deployment
    Properties:
      UsagePlanName: 'MyUsagePlan'
      ApiStages:
        - ApiId: !Ref ApiGateway
          Stage: 'prod'
      Throttle:
        BurstLimit: 100
        RateLimit: 50
      Quota:
        Limit: 1000
        Period: MONTH

  # API Key
  ApiKey:
    Type: 'AWS::ApiGateway::ApiKey'
    Properties:
      Enabled: true
      Name: 'APIExampleKey'

  # Usage Plan Key
  UsagePlanKey:
    Type: 'AWS::ApiGateway::UsagePlanKey'
    Properties:
      KeyId: !Ref ApiKey
      KeyType: 'API_KEY'
      UsagePlanId: !Ref UsagePl`

Two potential security issues that this does not address which require further extensions to deal with:

1. Authentification This does not ensure that the identity of the user of the API key is authenticated so anyone who obtains an API key could call the API. Building in authentification would require additional techniques.

2. Filtering Requiring an API key does not prevent someone unwanted who does not have a key hitting the endpoint a large number of times, even if they do not trigger the lambda. There is the capability to block IP addresses which do this that can be implemented using AWS Web Application Firewall (AWF)

5. Launching the API

To create the stack and generate the API (here called ExampleApiStack) we run the following from the folder with the CloudFormation template (The xxxxxxxxxxx being replaced by the AWS account id):

aws cloudformation create-stack --stack-name ExampleApiStack \
     --template-body file://api-lambda-example.yml \
     --region eu-west-2 \
     --capabilities CAPABILITY_IAM \
     --parameters \
            ParameterKey=ECRRepositoryUri,ParameterValue=xxxxxxxxxxx.dkr.ecr.eu-west-2.amazonaws.com/api_example_container:latest \
            ParameterKey=StageName,ParameterValue='prod'

Here the container image that the lambda runs and the StageName (here prod) are passed to the template as parameters.

To obtain the stack info:

aws cloudformation describe-stacks --stack-name ExampleApiStack

An example of the output from this is below. Some account specific information has been removed. stack_info

6. Calling the API

To get the information on the API endpoint that the CloudFormation stack has generated:

aws apigateway get-rest-apis

An example of what the command produces is shown below: api_info This gives us the API id which we insert into the general endpoint form that the stack produces (shown below) to call it.

https://{api_id}.execute-api.eu-west-2.amazonaws.com/prod/installer?input={text_passed_to_api}

In this example the api_id is 8ijgbd265k. The API key value can be obtained from the API Gateway page on the AWS console shown below. api_gateway To test the API works we curl the endpoint passing the API key in the header -H "x-api-key: and the input text in the endpoint (Some formatting issues initially threw the curl command on a Mac: Having curly quotes “ rather than straight quotes " and having whitespace after the \.) :

curl -X GET "https://8ijgbd265k.execute-api.eu-west-2.amazonaws.com/prod/installer?input=world" \
  -H "x-api-key: pqsJ6ObKmx1SxAN0cTlUCHKubRDxH2d5UPBeRsej"

This produces the return:

{"hello": "world", "numpy version": "2.2.1", "pandas version": "2.2.3"}

An example of what this looks like in the terminal is: curl_example To remove the API and its stack when it is no longer needed:

aws cloudformation delete-stack --stack-name ExampleApiStack

A variant of the usual Cloud warning At time of writing the AWS Lambda free tier in the eu-west-2 region includes one million free requests per month and beyond that it costs $0.20 per 1M requests see https://aws.amazon.com/lambda/pricing/. This means that, in principle, serverless should cost very little as long as the number of requests is low. In the example here it is probably significantly cheaper than having a virtual machine continuously running to handle the API calls. However, in the event of large number of requests, perhaps by someone maliciously calling the API many times and/or any data processing incurring high storage costs this might not be true. Until AWS implements the budget cap that everyone wants, as always, the costs of cloud services should be monitored and services turned off when not used to avoid incurring unwanted costs.