Today, we're going to take a small deviation from the normal Snowflake and Data Warehouse-centric concepts in this article. We'll to walk through the creation of your own API that sits on top of data stored in another type of database, DynamoDB, and then build a small web front end to test it.
You've probably coded against numerous APIs in your day-to-day work, but what is it like to create your own? In this article, we'll show you how you can create an API giving you some additional exposure to how a cloud service works, but also hopefully giving you inspiration for building your own in the future.
The following is a diagram of what we're going to build today: An API that is hosted in an AWS Lambda, all written in Python, and connected to the super powerful DynamoDB. The API is then exposed through the AWS API Gateway which allows us to add protections such as rate-limiting to our API. We also used Route 53 (their domain management tool) and CloudWatch for logging.
Let's jump in!
As mentioned, you've most certainly used APIs before. But how are they built and how do they work to help you get access to your data? Let's start off with some of the potential advantages.
For this application, I chose DynamoDB. DynamoDB is honestly incredible, and having some exposure to it is a must. It's referred to as a NoSQL, key-value store. There are a lot of different versions of databases like this, but this is one of the most popular. It doesn't look like a series of tables and doesn't support your typical SQL syntax or generally the idea of joining tables together. Rather, Dynamo prefers schemas that are denormalized and, if possible, all stored in a single table. Because of this, some use cases are certainly not appropriate for it. Read up a bit on when you can and shouldn't use this type of database before picking it.
To get started, we need to create a new table in the DynamoDB console called Items. The only setting you need to deal with at this time is setting the Partition Key to id
, which will represent the unique identifier for our data.
After that, we can use the AWS CLI to add items to our new table. You'll notice that we don't have to define any of the schema up front like we do with SQL; we simply add the key-value pairs and specify their data types when adding them. Simple as that.
aws dynamodb put-item \
--table-name Items \
--item '{
"id": {"S": "2"},
"id": {"S": "6ec6736f-ae36-4091-9cc0-1d8a98babee1"},
"name": {"S": "Smartphone"},
"description": {"S": "Latest model with 5G capability"}
}'
You can repeat that process with creating four or five additional items to get started.
Note: the Unique Identifier is created as UUID4 values. You can use Python or any number of sites on the internet to create one. This is a wonderful practice with any data identifiers you need, since there is a near-zero chance of collision between ones that are created, and you do not have to keep track of order or previously created value. Just simply generate a new one.
For our API, we're going to use the library FastAPI. FastAPI is a simple yet production-quality API in Python. First off, let's define what a Route is in FastAPI. A Route defines a specific endpoint (or URL path) that a client can access, along with the HTTP method (e.g., GET, POST, PUT, DELETE) used to interact with that endpoint. Let's take a look at a very simple one for a health check:
app = FastAPI()
@app.get("/health")
async def health():
return {"status": "healthy"}
That's it in its most basic form! The @app.get
decorator tells the incoming request what to do when the <URL>/health
is pinged. And the GET is the HTTP method. We can continue this pattern with all of the other endpoints we want to add, such as listing all items, creating a new item, editing an item, and deleting an item; also known as CRUD operations. Let's look at the get all items route.
@app.get("/items", response_model=List[Dict[str, Any]])
async def get_items(table: Any = Depends(get_dynamodb)):
try:
response = table.scan()
items = response.get("Items", [])
return items
except ClientError as e:
--- ERROR HANDLING HERE ---
There is a little more involved here, of course. We can see the route points to /items
and decorates a function that fetches the table from DynamoDB via the function get_dynamodb
(not shown). Then performs a table.scan()
, which is a DynamoDB API call to get the entire table. If we were making a more serious API here, we would want to implement filtering or pagination for this call so we do not end up overloading the response payload.
This process repeats itself for each of the endpoints you want to implement. You simply need to make sure you're picking the correct HTTP method and calling the proper DynamoDB APIs for performing the operation on the data.
Next we want to wrap our APIs in the code needed to deploy this as a Lambda Function. Fortunately there is not that much involved with migrating a normal set of Python code to a Lambda. One of the main things we can do is ensure that we have logging implemented. We can use the standard Python logger module, anything done with this package will automatically be written to CloudWatch.
# Configure logging for CloudWatch
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Lambda automatically captures logs from stdout/stderr and sends to CloudWatch
formatter = logging.Formatter(
"%(asctime)s | %(levelname)s | %(name)s:%(funcName)s:%(lineno)d - %(message)s"
)
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logger.addHandler(handler)
The last step we need to do for this application is leverage a package called Mangum. The Mangum package acts as an adapter to bridge an ASGI (Asynchronous Server Gateway Interface) framework (like FastAPI) with AWS Lambda and API Gateway. It essentially allows you to deploy ASGI applications to AWS Lambda as serverless functions.
Normally, an ASGI framework like FastAPI requires a persistent server to run (e.g., Uvicorn). However, AWS Lambda is event-driven and ephemeral, without a persistent server running in the background. Mangum bridges this gap by allowing FastAPI to function as an AWS Lambda handler. This is all done with a few lines of code.
from mangum import Mangum
# Create Lambda handler with API Gateway v2 configuration
lambda_handler = Mangum(app, lifespan="off", api_gateway_base_path="/default")
First, create a new Lambda Function from the AWS Console. Give it an appropriate name and make sure to select the proper runtime environment, Python, and whichever version suits your application. After the function is created, the one setting you probably want to change is the timeout. Its default is 3 seconds, and you can safely change that to 30 seconds. This setting can be found under General Configuration.
You will need to grant access to the DynamoDB table you created above by attaching an inline policy to the role that was created along with your Lambda function. You can navigate there from the Lambda function from the Configuration tab and the Permissions section.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:DeleteItem",
"dynamodb:Scan",
"dynamodb:Query",
"dynamodb:UpdateItem"
],
"Resource": "arn:aws:dynamodb:*:*:table/Items"
}
]
}
Deploying a Lambda function requires that you package up everything needed for your function in a zip file, including libraries that AWS may not have readily available in the Lambda environment. Since Lambdas have a file size limitation, do NOT package up libraries that are already installed, such as boto3, the Python AWS SDK.
To simplify this, you can create a Bash script that does all of the packaging for you. Simply run this to deploy your script and any updates.
#!/bin/bash
# Remove existing files
rm -rf package lambda.zip lvenv
# Create and activate virtual environment
python3.9 -m venv lvenv
source lvenv/bin/activate
# Create package directory
mkdir -p package
# Install dependencies
pip install -r requirements.txt --target ./package
# Copy lambda function to package directory
cp lambda_function.py ./package/
# Create deployment package
cd package && zip -r ../lambda.zip . && cd ..
# Cleanup
deactivate
rm -rf lvenv package
# Update Lambda function
aws lambda update-function-code --function-name DynamoAPI --zip-file fileb://lambda.zip
The final step here is to expose our Lambda function with the AWS API Gateway. We first start by attaching a new API Gateway to the Lambda function from the Lambda function page in the AWS Console. Select Add Trigger and then choose API Gateway. From here, create a New API Gateway and select HTTP API and set the Security to Open.
The next and most critical step of this process is to map the routes to your Lambda function. This is a simple process you can do from the API Gateway console. Navigate to your newly created gateway and select Routes from the sidebar. Create a new route for each of the routes in your Lambda function. You can additionally include ones that are autogenerated by FastAPI, such as /docs
.
After you have added all your routes, navigate to the Integrations section via the sidebar and then for each route, attach the Lambda function to the endpoint. This will tell the API gateway what to fire when it receives a request from the route.
Note: You can also set up custom rate limits for your API from the console under Throttling.
IIf you're a regular reader of this blog, you know that we're a huge fan of Streamlit. It's so easy to build a quick UI for data-driven apps. So, of course, we used that to build out a little test harness for our API.
One of the core concepts you will use when you work with a RESTful API is requests
. This package allows you to make the various web API calls that we set up in our API. Let's take a look at the first one, the GET Items Route.
Import requests
# Configure API URL
API_URL = "https://api.dataknowsall.com"
# Fetch items from API
try:
response = requests.get(f"{API_URL}/items")
response.raise_for_status() # Raise an exception for bad status codes
items = response.json()
You'll see in the try block that we use requests.get and simply directly call the API. This will return JSON
that contains the results of your query. In our case, we can simply make a Pandas DataFrame directly from this and display it in Streamlit with the st.dataframe()
function.
When it comes to the other verbs for our API (POST, PUT, DELETE
), we can follow a similar convention. Let's take a look at the code for EDIT
.
# Create the updated item dictionary
updated_item = {
'id': selected_item['id'],
'name': name,
'description': description
}
# Update item using API
response = requests.put(f"{API_URL}/items/{selected_item['id']}", json=updated_item)
To update a record we utilize requests.put
and then pass it the URL per our API specification, and finally include the JSON
needed to update the item. In this case, we only have three fields in our table, and we can simply create a dictionary with the key-value pairs that we want to update.
Note: The Add Item API is the exact same pattern except you use requests.post
.
As always, feel free to check out the full working source code on GitHub. This is a really fun application for you to try on your own! There are some additional setup and configuration steps in the README for the repo, which might be helpful if you're trying to set up your own.
In conclusion, building your own API on top of DynamoDB using AWS services like Lambda and API Gateway is not only a rewarding experience but also a practical skill that can enhance your understanding of cloud-based applications. By following the steps outlined in this article, you have learned how to create a robust and scalable API that can be easily integrated with various platforms. Whether you're looking to streamline data access, improve security, or simply explore new technologies, this project offers a comprehensive introduction to API development in a serverless environment. As you continue to experiment and refine your skills, remember that the possibilities are endless, and the knowledge gained here can be applied to countless other projects. Happy coding!