Language Models in NLP

Mitul Tiwari

Published Oct 3, 2022

Language models have completely changed the NLP world in the last 10 years. Language models emerged as one of the first steps in deep learning for NLP by applying pre-trained language models for embeddings. Language models applications are wide ranging from language translation, summarization, question-answering to Conversational AI. Here are 4 generations of language modeling evolution in the last 10 years: (1) first generation with word2vec and Glove; (2) second with contextual embeddings using sequence models, e.g., ELMO. (3) third with Transformers, e.g., BERT and USE. (4) fourth with large language models like GPT-3 and OPT. Here is the first part of the series of posts on language models starting with an overview of the first generation of language models like word2vec.

Word2vec came out in 2013 that addressed the curse of dimensionality in NLP (where even similar words were hard to match) and opened the door for deep learning in NLP. Word2vec embeds words in a continuous vector space where semantically similar words are mapped to nearby vector points. This enables very powerful vector operations. For example, you can say the vectors of “Rome” - “Italy” + “France”, and the resulting vector is closest to the vector for the word “Paris”.

Before word embedding a commonly used technique for word representations in NLP was a bag of words model. In the bag words model, a text is represented as a dictionary of words in the text and corresponding counts. It is an easy feature generation technique for text but suffers from various limitations. For example, it is hard to capture the order of words in a bag of words model. Also, it suffers from the curse of dimensionality, that is, the bag of words model has limited vocabulary and even similar words are hard to match.

In word2vec embedding, words are mapped in a higher dimensional vector space, typically 300. For example, the word Paris is mapped to a vector of 300 dimensions. Following are analogy examples, where pairs like country-city have similar vector differences (Mikolov et. al 2013).

How do you create these embeddings? In word2vec two approaches were suggested:

Recommended by LinkedIn

What is NLP?

Nicholas Niwamanya 2 years ago

Natural Language Processing

JALASUTRAPU BALAJI VISWANATH 4 years ago

🌟 NLP and Large Language Models ✨

Dilek Celik, PhD, FHEA 2 years ago

First, Continuous bag of words (CBOW): predict the target middle word from context words (2 words before and 2 words after). This process iterates over unlabeled text iteratively until convergence.
Second, Skipgram: predict the adjacent target words (2 before and 2 after) based on knowing the context word.

Distributed Representations of Words and Phrases and their Compositionality, Mikolov et. al

There have been subsequent enhancements to word2vec like Glove and FastText. Word2vec completely changed the NLP world 10 years back and led to the state of the art deep learning applications like language translation, text classification, summarization, entity extraction, question-answering, and conversational AI.

In the next few posts I will cover the subsequent generations of language models including contextual (e.g. ELMO), Transformer based (e.g., BERT and USE), and large-language models (e.g., GPT-3). Stay tuned!

Manyam Mallela 3y

nice post Mitul Tiwari, this has been truly transformational for NLP

1 Reaction

To view or add a comment, sign in

Language Models in NLP

Mitul Tiwari

Recommended by LinkedIn

More articles by Mitul Tiwari

Others also viewed

Diving into Natural Language Processing

Natural Language Processing

Introduction to Statistical NLP : Remembering Old Sports : Part - 1

Comprehensive Guide to Pre-Processing and Tokenization in NLP

Introduction to Natural Language Processing

Unleashing the Power of Natural Language Processing in Data Science

Natural Language Processing for Software Testing

Natural Language Processing for Software Testing

NLP : Important terms

Innovations in Language Modeling Techniques

Scaling Large Language Models from GPT-1 to GPT-3

How Language Models Transform Information Discovery

Using Multi-Dimensional Context in Large Language Models

How Large Language Models Create Text Responses

How to Train Custom Language Models

Explore content categories

Recommended by LinkedIn

More articles by Mitul Tiwari

Deep Dive into DeepSeek R1: Revolutionizing LLM Reinforcement Learning through Group Relative Policy Optimization (GRPO)

Introducing TapeAgents: A Powerful Framework for Building and Optimizing AI Agents

AI Agents, Agentic Patterns and DSPy

Mixture of experts LLMs

Domain Adaptation of Large Language Models and Aligning to Human Preferences

Large Language Models II: Attention, Transformers and LLMs

Thoughts on BayLearn 2023

Exploring Zero-shot and Few-Shot Techniques for Intent Classification using LLMs

Thoughts on TheWeb Conferences

Using LLMs for Data Augmentation to Recognize Dialog Act

Others also viewed

Diving into Natural Language Processing

Natural Language Processing

Introduction to Statistical NLP : Remembering Old Sports : Part - 1

Comprehensive Guide to Pre-Processing and Tokenization in NLP

Introduction to Natural Language Processing

Unleashing the Power of Natural Language Processing in Data Science

Natural Language Processing for Software Testing

Natural Language Processing for Software Testing

NLP : Important terms

Similar topics

Innovations in Language Modeling Techniques

Scaling Large Language Models from GPT-1 to GPT-3

How Language Models Transform Information Discovery

Using Multi-Dimensional Context in Large Language Models

How Large Language Models Create Text Responses

How to Train Custom Language Models

Explore content categories