Transpire has fully embraced serverless technologies ever since AWS first made them available back in 2017.. They’ve proven exceptionally useful in Vodafone Foundation’s DreamLab project, where we have taken advantage of code portability, ease of deployment, wide language support, and all that good stuff.
Almost all of our backend infrastructure is composed of Lambda functions behind API Gateway endpoints. As the number of features in the DreamLab API has grown over the years, we’ve occasionally seen Lambda functions getting a bit too large. Large Lambda functions lead to code that is difficult to test, monitor, debug, and maintain. The obvious solution is to split them up into smaller functions.
AWS Step Functions promises to solve this problem for us!
What Are Step Functions?
According to the AWS documentation;
“AWS Step Functions is a serverless function orchestrator that makes it easy to sequence AWS Lambda functions and multiple AWS services into business-critical applications. Through its visual interface, you can create and run a series of checkpointed and event-driven workflows that maintain the application state. The output of one step acts as input into the next. Each step in your application executes in order and as expected based on your defined business logic.”
Using Step Functions, you can call a bunch of Lambda functions in an order defined by your state machine, with basic conditions, branching, loops and error handling. Your step functions are displayed visually as a flowchart, so it’s not surprising that it maps exceptionally well to a flowchart diagram level of abstraction.
This is an ok description, but it could be described even more simply:
AWS Step Functions is a flowchart with a Lambda function execution[1] at each step.
Example
While there’s plenty of abstract examples of implementing Step Functions in the official AWS documentation, most of which is really quite good, sometimes a more practical example is a better way to learn. This is something we’ve done just last week on Vodafone Foundation’s DreamLab project.
Background
We need to calculate statistics every day for each project that DreamLab powers. These statistics include things like how complete a project is, how many users are powering it etc.
Currently this is implemented as a single Lambda function that goes through the following steps:
- We gather raw data from a bunch of different sources, including CloudWatch Metric data, DynamoDB document data and table metadata.
- Using this raw data, we do some basic calculations, such as determining the completion percentage of a project, the estimated date of completion etc.
- We then save this data out to our database, and record some custom CloudWatch metrics for other systems to read later.
Since all these steps were happening in a single Lambda function, it was becoming difficult to test and debug. For example, if we wanted to make sure we were calculating a percentage correctly, we’d need to modify the actual CloudWatch metrics somehow to construct a test case — there was no way to just pass some test data to our function easily.
Stepping Up
We could have just split up the code into multiple Lambda functions, using events to trigger each other in a chain. But step functions are far more fit-for-purpose, and it’s definitely worth the small amount of work to set them up over rolling your own solution.
Serverless Application Model
Just a quick note about the AWS Serverless Application Model (SAM) — in the DreamLab project we’ve only recently started taking advantage of all the features that SAM has to offer. In the AWS world it’s under pretty active development. In this case, SAM makes setting up a step function state machine really simple. It abstracts away much of the boilerplate and greatly reduces the size and complexity of your CloudFormation templates. Again, it’s worth understanding and using, though you might want to make sure you understand the basic concepts of Lambda functions, IAM roles, and Step Function state machines before looking at SAM, so you’ll at least know what is going on behind the scenes.
We’ll break our single Lambda function into multiple functions at logical points. Data will be passed between them as JSON, with the output of one function being the input of the next.
Then, we can describe our workflow in an “Amazon States Language” file. This is a simple JSON file that describes a workflow, by referencing your Lambda functions. Using this file you can tell the order in which functions are called, if they are executed in parallel, control flow, error handling, retry policies etc.
{
"StartAt": "Fetch Projects",
"States": {
"Fetch Projects": {
"Type": "Task",
"Resource": "${FetchProjectsFunctionArn}",
"Next": "Fetch Statistics"
},
"Fetch Statistics": {
"Type": "Task",
"Resource": "${FetchStatisticsFunctionArn}",
"Retry": [
{
"ErrorEquals": [
"States.TaskFailed"
],
"IntervalSeconds": 15,
"MaxAttempts": 5,
"BackoffRate": 1.5
}
],
"Next": "Process Statistics"
},
"Process Statistics": {
"Type": "Task",
"Resource": "${ProcessStatisticsFunctionArn}",
"TimeoutSeconds": 5,
"Next": "Save Statistics"
},
"Save Statistics": {
"Type": "Parallel",
"End": true,
"Branches": [
{
"StartAt": "Save Statistics to CloudWatch",
"States": {
"Save Statistics to CloudWatch": {
"Type": "Task",
"Resource": "${SaveStatisticsCloudWatchFunctionArn}",
"End": true
}
}
},
{
"StartAt": "Save Statistics to DynamoDB",
"States": {
"Save Statistics to DynamoDB": {
"Type": "Task",
"Resource": "${SaveStatisticsDynamoDBFunctionArn}",
"End": true
}
}
}
]
}
}
}
Here is what our State Machine and step functions look like now:
And here is what it looks like displayed as a “Visual workflow” in the Step Functions backend:
Now, if we encounter an error in one of our functions, we can easily see where the workflow failed and exactly what data was input to cause the error:
You can even watch in real-time while the workflow executes and inspect it at any point:
Real-World Benefits
With our statistics processor now set up in Step Functions, here are a few of the cool things we have been able to do:
- Execute individual functions outside of step functions. Because each step of our workflow is just a simple Lambda function, we can still go to the Lambda section of the console and run it just like any other. There is nothing special about these functions — step functions simply glues them together for us.
- Watch data go in and out of each function. The AWS console gives us easy access to the data going in and out of each function. This makes it really easy to debug errors and provides tremendous visibility.
- Write unit tests for each function. Because each function is smaller and less tightly coupled we’ve now been able to write simpler unit tests and increase our test coverage. These tests get run as part of our automated test suite.
- Go back and view previous executions. If we notice something has gone wrong during a particular execution, we can view previous executions of the functions in the web console. While we could always log out to CloudWatch with debugging information, having it displayed here alongside the actual implementation gives us invaluable context.
Conclusion
I hope this has shown you what Step Functions are, how they can be useful, and that they really aren’t all that complex.
Any time you have a large Lambda function, or you want to be able to easily inspect the flow of data through a number of functions, Step Functions is a fantastic, simple tool.
Footnotes
[1] Step Functions lets you do more than just deal with Lambda functions, but since most people will have their introduction to serverless with Lambda, it makes sense to just focus on it in this article.