Deployment of NLP application on Heroku and tackling the problems
Finally I have deployed my NLP application on Heroku for the first time and it feels great to say that it is working!
Training a model on a dataset and then predicting is an easy task which normally is done for college assignments etc, but it is interesting to understand how to develop an end to end project so that your application is usable by those who are not accustomed to the world of data science. For deployment you need your application to be both optimized and adaptable to various devices.
The problem statement which I worked on was to predict movie rating from a review text.
In this article, I will list down the problems which I faced and how I managed to tackle all those to successfully get my application working.
- Removing dependency of pre-trained word vectors : As it was an NLP problem, I had used the already available glove embeddings which has pre-trained vectors for each word. After training a deep learning model on IMDB dataset I thought for once that my model is ready. But as I was planning to connect my github account to heroku, uploading of word vector files were also necessary which is not feasible as github only allows uploading file of a maximum of 25 MB size.
- Using word indices and BILSTM model : This led me to use word indices and using those I trained a BILSTM model which returned me better training and testing accuracy (around 90%) .
- The NLTK library conundrum : For the pre-processing of text, I had used NLTK library for three purposes: tokenize words, removing stopwords and lemmatizing. For deployment, you need to give NLTK as a requirement file so that the library is installed on the server. After successful build the website was opening nicely but when I tried executing the app (running the predict function) after giving review text, server kept getting crashed. Viewing the logs, I understood that the maximum memory limit of 500 MB was getting exceeded. I was sure this was happening because the whole NLTK library was getting installed.
- Removing the NLTK dependency : For removing stopwords, I just took a list of all available stopwords in English library as a list and ignored while iterating through each sentence. Tokenization of words was done by simply performing the split operation in python. I had to skip lemmatization so that NLTK is not even required by my application. After re-deployment, my app was executing successfully!
- Compatibility : Last but not the least, I had to modify my CSS file so that app can be run on all kinds of devices.
Following are the links for application and github repository:
Hope it helps for all those who are deploying apps for the first time.
Hi Subhamda, I found the app quite good. However, If you please tell me how did you handle sarcastic reviews, it would have been enriching. Subham Nagar