Extract Excel Data with AWS Lambda
A requirement existed where I needed to extract data from Excel from within an AWS Lambda Function. Having read several articles on how to do this, I realised that a set of python libraries needed to be zipped and uploaded to AWS as a lambda layer; the lambda layer could then be used within the function by the python code.
However, none of the articles that I found worked successfully for me and I kept getting "Unable to import module" error message, even though the module in question was included in the zipped package.
The following steps describe how I solved my problem using AWS cloud9:
First, I created 2 top level folders named Virtual_Env and Working_Packages. From the first folder I opened a new terminal. From here I typed:
python3 -m venv venv_projectname/
where projectname is the name of my project
This creates a sub folder containing folders bin, include, lib and lib64. The bin folder contains a script to activate this folder as a virtual environment. To do this I used the following line:
source venv_projectname/bin/activate
To confirm that I was working in the new virtual environment I typed:
pip list
which confirms that no packages currently exist
Using pip once more I installed all the python libraries that I needed:
pip install panda
pip install xlrd
pip install openpyxls
and then confirmed their existance by using pip list once more. I then deactivated the virtual environment by:
deactivate
The result is that all python libraries that I required now existed within folder lib64/python3.x/site-packages. It was at this point that I came unstuck in my previous attempts.
Within my second top level folder I created a sub folder named projectname where I copied the site-packages folder. Lambda expects all packaged libraries to be inside a folder named python so I renamed site-packages to python to get:
Working_Packages > projectname > python > packaged libraries listed here
I then opened another terminal, this time at the projectname level of the Working_Packages folder and removed all unwanted folders by:
rm -r dist-info*
To zip everything up I used the following:
ZIP="projectname.zip
zip -r ${ZIP} *"
and then downloaded the file to my local machine.
The final step was to upload the zip file as a lambda layer and add the layer to the lambda function. Importing the library xlrd (import xlrd) a dependancy for the pandas pd.read_excel() function within my lambda function worked.
Happy days