NLP develops Context using Pre-Trained Models.
NLP problems has always been unique and challenging. It reflects how complex and at same time beautiful human language could be. For e.g. I read Stephen Hawking novel yesterday. Can you read this message for me? The verb "read" in the first sentence is in past tense whereas the same verb is in present tense in the second sentence. A case of Polysemy where a word might have multiple meanings or context. Traditional word embeddings like Glove, word2vec fails to distinguish between the polysemous words as they cannot grasp the context in which the word was used.
Recent NLP frameworks inventions like Google’s BERT (Bidirectional Encoder Representations from Transformers), Zalando's Flair, AllenNLP's ELMo, ULMFiT, OpenAI are able to parse through sentences and grasp the context.
One of the biggest thorn in NLP is inability of machines to understand the actual context or meaning of a sentence. One of the reason for this is shortage of training data. As NLP is a diversified field with distinct tasks and mostly task-specific datasets contains few thousands or hundred thousands of human-labeled training examples. Whereas, modern deep learning-based NLP models see benefits only from humungous amounts of data, thereby improving performance when trained on millions, or billions, of annotated training examples.
To fill this gap in data, researchers have developed a variety of techniques for training general purpose language representation models using the enormous amount of unannotated text on the web also known as pre-training. The pre-trained model can be fine-tuned on small-data NLP tasks resulting in substantial improvement in accuracy compared to training on same datasets when trained from scratch. For instance Google's BERT makes use of transformer that learns contextual relations between words or sub-words in a text. Directional models reads the text input sequentially i.e. either left-to-right or right-to-left, BERTs transformer encoder reads the entire sequence of words at once. Hence it is bidirectional or non-directional. This characteristic allows the model to learn the context of a word based on all of its surroundings i.e. left and right of the word. Similarly AllenNLP's ELMo is feature-based training where, a pre-trained neural network produces word embeddings which are then used as features in NLP models.
These NLP models and frameworks could be used for wide variety of tasks including sentiment analysis (classification task), Question Answering tasks, or for NER tasks. Hence such pre-trained models requires a bit of fine-tuning but these models can save ton of time and computational resources. Also such transfer learning essentially helps to train a model on one dataset and then adapt that model to perform different NLP functions on a different dataset.