From the course: Deep Learning with Python and Keras: Build a Model for Sentiment Analysis
Unlock this course with a free trial
Join today to access over 25,500 courses taught by industry experts.
Preprocessing text for sentiment analysis
From the course: Deep Learning with Python and Keras: Build a Model for Sentiment Analysis
Preprocessing text for sentiment analysis
- [Instructor] When you're working with text data and using it to train machine learning models, there is a bunch of pre-processing and cleaning that you have to do before you can actually use that text data for model building and training. And here in this movie, of those pre-processing techniques. we'll briefly discuss some Text pre-processing usually includes tokenization, lemmatization, and stop word removal. Let's talk about tokenization first. Machine learning models don't work with the entire chunk of text that you feed in. Tokenization is the process of breaking down text into smaller units called tokens. Tokens can be words, numbers, or punctuation marks. Tokenization helps in structuring the text for further processing and analysis, and is a fundamental step to understand and interpret the text by analyzing its individual components. Lemmatization involves reducing words to their base or root form.…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.