My intention with this discussion is to bring up the possibility to have a slim version of GE that can be used within a lambda function. This is different than putting it into an EC2 instance to work, because I’m looking for to have a serverless event-driven function that runs QA/QC for files coming into S3 buckets as part of an entire serverless flow.
I’ve tested some other functions such as pandera, which worked perfectly within a Lambda function. The problem is the entire GE distribution is too big to put it into a package, and AWS have a hard limit of 250 MB when unzipped for each package.
I did some research and removed these libraries, and made it smaller to fit the requirement (barely):
- Notebook
- IPython widgets
- Jupyter client
- Jupyter core
- widgetsnbextensions
Can we have an official (better, mine is just removing things i guess won’t use) version of GE for AWS lambdas?