Choose the Right Storage Solution for your AWS Lambda Function

A few weeks ago AWS Lambda added the capability to increase its ephemeral storage up to 10GB. After that announcement, I got a lot of questions regarding when to use ephemeral storage over Amazon EFS or Amazon S3.

In this post, I want to share with you four different ways to store files when using Lambda and some of the most common use cases. If you have new use cases, please let me know in the comments box. I hope that after reading this you have more information to make better decisions when to choose one solution over the other.

Amazon S3

Amazon S3 and AWS Lambda integration is simple, on the operational side and on how the two services communicate.

On the operational side, Amazon S3 scales and is highly available. S3 scales with your functions. No matter how many functions run in parallel, S3 will scale according. S3 high-availability means that when you store a file in S3, it will be there until you remove it.

On the communication side, S3 and Lambda have a bi-directional communication. This means that you can store and retrieve objects from the Lambda functions in S3, but also you can trigger functions when objects suffer changes - added, changed or deleted.

The integration between these two services is simple to set up as none of the services needs a VPC. Simply give permissions from the Lambda functions to access the bucket and the bucket to trigger the function, and you are good.

In the following example, you can see how you can trigger a Lambda function when a new file is created in an S3 bucket and also how you can write into an S3 bucket from a function.

https://github.com/mavi888/sam-editing-video/blob/main/template.yml

One other benefit of S3 is that the content is dynamic. You can change the content in the buckets without redeploying the function. And the objects will be available in the function right away. In addition, all data stored in an S3 bucket can be shared across multiple instances of the function, multiple functions, and multiple applications.

S3 and Lambda integration goes one step further with S3 Object Lambda. This feature from S3 allows you to change your data stored in S3 on the fly when it is fetched. This is a great feature when you have multiple applications accessing the same data and you want to reshape, resize or even anonymize the data.

In this video, you can see a demo on how to use S3 Object Lambda to anonymize data stored on S3 when fetching it.

https://youtu.be/EDv9f9A-jck

However, S3 is not a file system, it has a flat hierarchical organization. From outside, it might look that you can have folders inside the buckets, but those folders are just a prefix for the object name.

Also, S3 speed for retrieving objects is the slowest of all the storage options available for Lambda. This is because the objects are not inside the function, and they need to be transferred from the S3 service to the Lambda service.

Example of use cases of Amazon S3

When you are working with data lakes and you have multiple Lambda functions consuming and creating objects.
When you need to have shared objects between multiple applications or functions. For example, you want to process a video. The first function will resize it, then the second will add the transcribe it and the third one will add the subtitles. These functions are sharing the video object.
When you need to have shared objects between multiple applications. For example, after one application processes the videos, another application takes the videos and uploads them to the system so they can be played and another application grabs the video metadata and adds it to the administrative portal.
When you need to work with huge files. However, when working with Lambda, you need to transfer that file to the function and that can be slow and also you might need to do some processing locally.
When we want to trigger Lambda functions from the lifecycle of the object. When we want to do something when a file is created, modified or deleted. This is a powerful integration for event-driven architectures.

Ephemeral storage

Ephemeral storage refers to the storage capability inside the execution environment of your Lambda function. Until few weeks it was only 512MB and now we can configure it up to 10GB.

When you deploy your function, you set the storage space and you cannot change it without redeploying the function. In addition, this storage is not shared between different functions.

However, different Lambda invocations in the same warm execution share the storage space. This means that there can be something in the /temp directory if you don't clean it before the function ends its execution. But when the function is destroyed, the files inside the ephemeral storage are deleted. That is why when building applications that rely on ephemeral storage, you cannot rely that there will be something in there.

The integration between Lambda and the ephemeral storage is native, and you can work with this storage, as you would use a local hard drive. This is the fastest of all the storage options. As the ephemeral, storage is inside the instance running the function.

In this video, you can see how to configure the ephemeral storage with AWS SAM and a simple demo that showcases how to use it.

https://youtu.be/39qK-kih9p0

Example of use cases for Ephemeral Storage

When you need cache between invocations. If you need to download big files from S3 or from the internet, you can use the ephemeral storage as a cache. Download all the files before your function handler executes and then whenever it starts the handler, you can check if the files are available in the temporary storage. If that is the case, then you can use the ones in there, if not download them. These files will last until de function gets destroyed.
When you need to process big files. For example, videos, pdfs, zips or even machine learning models. You can download them to the function and then use them from there.

Amazon EFS

Amazon EFS is a fully managed, elastic, shared file system and it integrates with other AWS services. It offers high availability. Many applications, functions and invocations of the functions can access the EFS filesystem and also change the content dynamically.

There are no size limits for the file system of EFS, but the amount of concurrent connection is limited to 25.000 connections per file system. During initialization, each instance of a function creates a single connection to its file system that persists across invocations. This means that you can reach 25.000 concurrency across one or more functions connected to a file system.

The connection between Lambda and EFS is quick, but not as fast as ephemeral storage.

EFS and Lambda don't integrate natively. To access your file system, you need to put the Lambda function inside a VPC. But after you mount your EFS file system, you can use the normal programing language tools to access the file system.

In the following video, you can learn how to create an EFS and use it from a Lambda function.

https://youtu.be/DId1cFqo5ww

Example of use cases with EFS

When you need dependency management. You can save all your latests dependencies in your EFS and share them between all your Lambda functions. When ever you want to update a dependency you just update it in the EFS and all the functions will access them without the need to redeploy the functions.
When you need to change files or append files to a file system.
When you want to share packages with your functions that exceed the limits of your Lambda layers or the temporary storage.

Lambda Layers

I imagine you don't consider Lambda Layers as part of a storage solution for your Lambda function, but it is.

Lambda Layers are a storage solution, where you can package whatever you want to share with your functions. Lambda Layers are part of your deployment package and they cannot be changed without re-deploying the function.

Layers can be used across multiple functions, but functions cannot change them, they are read-only. Layers have a version number, so you can track what is inside a layer at all times. One cool thing about layers is that they can be shared across accounts, so you can use layers from a third-party provider, or deployed in another AWS account within your organization. However, layers cannot be shared across regions.

Accessing the data inside the layers is as fast as with ephemeral storage. Because, when the function is deployed, the layers are extracted inside the running instance.

Layers size is limited to the deployment size of the Lambda package, that is 50MB when zipped. Also, Lambda functions can have a maximum of 5 layers. Therefore, you need to be careful with what you put on the layers and how you create them.

In this video, you will learn how to create layers for your own dependencies as well for third party dependencies.

https://youtu.be/KYZ1Dop7DYg

Example of use cases with Lambda Layers

When you need dependency management. Layers are mostly used for sharing third party dependencies and company own dependencies between the different functions. Because of the version number, they are great for auditing the code, and making sure that the latest dependencies are enforced in all the organization.
When you need to share compiled libraries. Sometimes you need to use some compiled library in your function.
When you need to use Lambda Extensions. Lambda extensions for share their functionality with layers, as they can be easily shared between different accounts and providers.

Conclusion

After reading this post, you learn the four ways that you can store files and objects when using Lambda functions - Amazon S3, ephemeral storage, Amazon EFS and Lambda layers. Each of the solutions has different benefits and tradeoffs, that you need to evaluate when picking what you need in your function.

You can also mix many solutions in one application. For example, the code I share when talking about Amazon S3 uses three of the four storage solutions - Amazon S3, ephemeral storage and Layers, each of them for a different use case inside the application.

Choose the Right Storage Solution for your AWS Lambda Function