Large Language Models

Large Language Models

Large Language Models (LLMs) are subset of Deep learning. LLM and Generative AI intersect but are a part of deep learning.Generative AI is type of AI that can produce new Content including text,images,videos,audio and synthetic data.

What is Large language Model?

Large ,general purpose language models can be pre-trained and then fine - tuned for specific purposes. Consider training a dog to sit, stand, stay and come.These are the basic trainings that one can give to a dog but to train the dog for special purposes like police-dog, service-dog, hunting-dog special trainings would be required. Similar idea applies to LLM to solve common language problems like text classification,Question answering,Document Summarization and Text Generation.This could then be tailored to solve specific problems in different fields like Retail,Finance, entertainment using relatively small size of field datasets.

Features of Large Language Models

  1. Large: Large indicates enormous size of training dataset, it also indicates large number of parameters(skill of a model in solving a problem such as predicting text)
  2. General Purpose: General purpose indicates it is sufficient to solve common problems. This is because of the idea of commonality of human languages and Resource restriction as only some organization have resources to train such humongous data sets with tremendous number of parameters.
  3. Pre-Trained and fine-tuned: This means to pre train a large language model with large dataset for general purpose and then fine tune it with small dataset

Benefits of LLMs:

  • A single model can be used for different tasks (like text completion and prediction, question answering , text classification and more)
  • LLM requires small data set when tailored to solve a specific problem.It can be used for few shots(less trained sceraios) or zero shots (not trained scenarios)
  • LLM performance continuously grows as more data and parameters are provided.

Pathways Language Model (PaLM)

PaLM was released by google april 2022, a 540 billion parameters model that achieves state of the art performance across multiple language tasks

PaLM is a dense decoder only transformer model which leverages the new Pathway system which is the new AI architecture that can handle multiple tasks at once, learn new tasks seamlessly.

It has distributed computation of accelerators

what is a Transformer model?

Transformer model consist of a encoder and decoder. Encoder encodes the input sequence and passes it to decoder model to decode the input sequence to required output. ex: converting Spanish to English.

We have come a long way from traditional programming where we use to provide attributes that were hard coded for instance:

to define a CAT:

type:animal, legs: 4, ears: 2 ,fur: yes, etc

Then we entered into neural networks where we could give the pictures of a cat and dogs and could classify whether a picture is a cat or not

Now in generative wave we as a user can create our own content whether text, audio, video or images by using LLM's like PaLM and LaMDA that ingests very very large amount of data to build foundation language model and provides a prompt to answer all your questions

In LLM development one does not require ML expertise or training examples or does not even require to train a model. All it takes is to design a prompt as per the requirement

USE CASE of Question Answering

Question Answering (QA) is a subfield of Natural Language Processing that deals with the task of automatically answering questions posed in natural language.Question Answering models are able to retrieve the answer to a question from a given text. This is useful for searching for an answer in a document. Depending on the model used, the answer can be directly extracted from text or generated from scratch.

What are Prompt design and Prompt Engineering?

Prompt design is a process to create a prompt that is tailored to perform a specific task. For example if the system is asked to translate text from English to French, then the prompt should be Written in English and the translation should be in french

Prompt Engineering is the process to design a prompt to improve performance that might require domain knowledge, examples of desired outputs or keywords that are known to be effective for specific system.

Hence Prompt design is a general concept whereas prompt engineering is a more specific concept.Prompt design is essential while prompt engineering is required in case high degree of accuracy or performance is requied.

Types of LLMs

  1. Generic (or Raw) Language Models : These predict the next word (technically token) based on the language in the training data. ex : predicting an incomplete statement
  2. Instruction Tuned : Trained to predict a response to the instructions given in the input. ex: sentiment analysis of a statement
  3. Dialog Tuned : Trained to have a dialog by predicting the next response.Dialog-tuned models are a special case of instruction tuned where requests are typically framed as questions to a chat bot.Dialog tuning is a further specialization of instruction tuning that is expected to be in the context of a longer back and forth conversation, and typically works better with natural question-like phrasings.

**Reference: Google Learn

To view or add a comment, sign in

More articles by Debashish J.

Explore content categories