NLP develops Context using Pre-Trained Models.

Karuna Puri

Published May 28, 2019

NLP problems has always been unique and challenging. It reflects how complex and at same time beautiful human language could be. For e.g. I read Stephen Hawking novel yesterday. Can you read this message for me? The verb "read" in the first sentence is in past tense whereas the same verb is in present tense in the second sentence. A case of Polysemy where a word might have multiple meanings or context. Traditional word embeddings like Glove, word2vec fails to distinguish between the polysemous words as they cannot grasp the context in which the word was used.

Recent NLP frameworks inventions like Google’s BERT (Bidirectional Encoder Representations from Transformers), Zalando's Flair, AllenNLP's ELMo, ULMFiT, OpenAI are able to parse through sentences and grasp the context.

One of the biggest thorn in NLP is inability of machines to understand the actual context or meaning of a sentence. One of the reason for this is shortage of training data. As NLP is a diversified field with distinct tasks and mostly task-specific datasets contains few thousands or hundred thousands of human-labeled training examples. Whereas, modern deep learning-based NLP models see benefits only from humungous amounts of data, thereby improving performance when trained on millions, or billions, of annotated training examples.

To fill this gap in data, researchers have developed a variety of techniques for training general purpose language representation models using the enormous amount of unannotated text on the web also known as pre-training. The pre-trained model can be fine-tuned on small-data NLP tasks resulting in substantial improvement in accuracy compared to training on same datasets when trained from scratch. For instance Google's BERT makes use of transformer that learns contextual relations between words or sub-words in a text. Directional models reads the text input sequentially i.e. either left-to-right or right-to-left, BERTs transformer encoder reads the entire sequence of words at once. Hence it is bidirectional or non-directional. This characteristic allows the model to learn the context of a word based on all of its surroundings i.e. left and right of the word. Similarly AllenNLP's ELMo is feature-based training where, a pre-trained neural network produces word embeddings which are then used as features in NLP models.

These NLP models and frameworks could be used for wide variety of tasks including sentiment analysis (classification task), Question Answering tasks, or for NER tasks. Hence such pre-trained models requires a bit of fine-tuning but these models can save ton of time and computational resources. Also such transfer learning essentially helps to train a model on one dataset and then adapt that model to perform different NLP functions on a different dataset.

To view or add a comment, sign in

NLP develops Context using Pre-Trained Models.

Karuna Puri

More articles by Karuna Puri

Others also viewed

Demystifying NLP & NLP Myths

What is NLP & What Do NLP Scientists Do?

NLP systems don't know how to say no

How to classify text in 100 languages with a single NLP model

On Maps and Keywords - one of many lessons from NLP for EO

Discover Your Westeros Legacy: Use NLP to Find Your Affiliation with the 7 Great Houses

From NLP from Scratch to BERT: My Hands-On Journey as a Machine Learning Beginner

Word embeddings and it’s usage in NLP

From Messy Text to Meaningful Insights: Modern Approaches to Large-Scale NLP

Deep Learning in NLP

Pretraining Strategies for Large Language Models

Using Multi-Dimensional Context in Large Language Models

Data Preprocessing for Large Language Models

Explore content categories

More articles by Karuna Puri

OOM: The silent killer of the system ..

Are you ready to code in Go?

GraphQL: Data in Precise Request

The 3D's Architecture & Development ..

Scala Cats to handle side effects for writing purely functional code..

Transform Data Source with AWS Glue: Managed ETL Platform

Cloud experience with AWS vs. GCP

Can Elastic make your Stack better ...

Ontologies in Knowledge Graph makes Data Smarter ..

Is it Scala or Kotlin.... Kotlin or Scala.?