GE in AWS Lambda

My intention with this discussion is to bring up the possibility to have a slim version of GE that can be used within a lambda function. This is different than putting it into an EC2 instance to work, because I’m looking for to have a serverless event-driven function that runs QA/QC for files coming into S3 buckets as part of an entire serverless flow.

I’ve tested some other functions such as pandera, which worked perfectly within a Lambda function. The problem is the entire GE distribution is too big to put it into a package, and AWS have a hard limit of 250 MB when unzipped for each package.

I did some research and removed these libraries, and made it smaller to fit the requirement (barely):

  • Notebook
  • IPython widgets
  • Jupyter client
  • Jupyter core
  • widgetsnbextensions

Can we have an official (better, mine is just removing things i guess won’t use) version of GE for AWS lambdas?

4 Likes

We don’t have an official distribution of Great Expectations for AWS Lambda yet, although we used it on Lambda internally. This is a great feature request!

Do you have some reproducible steps for removing the libraries?