The article, How to Improve Lambda Cold Start Performance, was first published on the Lumigo blog.
Discover how your team can slash the time it spends monitoring & troubleshooting AWS serverless apps with Lumigo.
One of the great promises of serverless has always been that it would free developers to focus on writing code without having to give too much consideration to the underlying infrastructure. But the advantages presented by the instantly, infinitely scalable nature of serverless come with limitations and unique considerations that you need to take into account.
Among the biggest issues we - a 100% serverless company - have faced, are cold starts, the extra time it takes for a function to execute when it hasn’t recently been invoked.
Cold starts can be a killer, especially if you’re developing a customer-facing application that needs to operate in real time. They happen because if your lambda is not already running, AWS needs to deploy your code and spin up a new container before the request can begin. This can mean a request takes much longer to execute, and only when the container is ready can your lambda start running.
Cold starts are a necessary evil
The fact of the matter is that cold starts are a necessary byproduct of the scalability of serverless.
AWS needs a ready supply of containers to spin up when functions are invoked. That means that functions are kept warm for a limited amount of time (usually 30 - 45 minutes) after executing, before being spun down so that container is ready for any new function to be invoked.
Cold starts account for less than 0.25% of requests but the impact can be huge, sometimes requiring 5 seconds to execute the code. This issue is particularly relevant to applications that need to run executions in real-time, or those that rely on split-second timing.
It’s widely known that AWS has been working consistently to “solve” the cold start issue once and for all, and while improvements have been incremental, a plan to improve the startup performance of Lambdas inside VPCs was announced at re:Invent 2018.
5 ways to improve cold start performance
While cold start times are a real issue, the good news is that there are very useful tools and approaches that can help to mitigate the problem, either by avoiding cold starts altogether or reducing their duration.
Monitoring can be a challenge in a serverless environment, but it’s a critical first step to being able to address areas of inefficiency in your application.
Both CloudWatch Logs and X-Ray can help you to identify where and when cold starts are occuring in your application, although it requires some active process of deduction on your part. Monitoring will also help you identify which cold starts are truly problematic to the smooth running of your application and satisfaction of users, and which can be safely ignored.
There are also several commercial tools, our platform Lumigo among them, that make it much easier to monitor whether - and how often - your application is being affected by cold starts.
Keep lambdas warm
As we noted above, Lambdas are kept warm by AWS for a limited time after being invoked, “in anticipation of another Lambda function invocation”. Warm Lambdas will always give you faster execution times.
One thing you can do to guard against cold starts is to ensure that your Lambdas do not become inactive. Tools such as the Lambda Warmer by Jeremy Daly or the Serverless Plugin Warmup, created by the team at Fidel, invoke the lambdas at a given interval to ensure that the containers aren’t destroyed.
Reduce the number of packages
We’ve seen that the biggest impact on AWS Lambda cold start times is not the size of the package but the initialization time when the package is actually loaded for the first time.
Related Research - Web Frameworks Implication on Serverless Cold Start Performance in NodeJS
Choose the right language
There is no doubt that your choice of programming language will have an effect on the length of the cold start times you see.
In one experiment, Nathan Malishev found that Python, Node.js and Go took much less time to initialize than Java or .NET, with Python performing at least twice as quickly compared to Java, depending on memory allocation.
Get Lambdas out of VPC
Unless it’s really necessary (for instance, to access resources within a VPC) try to get your Lambdas running outside of your VPC, as the attachment of the ENI interfaces can have a huge impact on cold start times.
In these experiments run by Alessandro Morandi, he found that a Lambda with 3GB of memory took up to 30x longer to invoke from a cold start when inside a VPC compared to outside a VPC.
And as Yan Cui pointed out in a recent article, while VPCs provide necessary protection to EC2s, they aren’t required to provide that kind of security to Lambdas.
Following the recent announcement by AWS (September 2019), it seems that performance issues affecting Lambdas inside VPCs could soon be a thing of the past. Leveraging Hyperplane, the Network Function Virtualization platform, AWS is claiming “dramatic improvements to [Lambda] function startup performance” inside VPCs. It all looks very promising, but we’ll wait until the update is fully rolled out over the next couple months before changing our recommendations.
As we’ve seen, there are a variety of ways to reduce the impact of cold starts, but unfortunately none of them can be relied upon to completely solve the problem in every situation.
The best advice is to monitor your application and focus most effort only on the cold starts that have a direct effect on user experience or the smooth running of your application.
Lambda cold start times have become something of an obsession of the first wave of serverless adopters, and that’s not likely to change anytime soon. It’s one of the few niggles affecting a really exciting new way to build software.
But there’s little doubt that Amazon (and the other cloud platform providers) will continue to work away on this problem until it’s no longer an issue. And when they do, you can be sure that we will be raising a frosty beer in their honor!