From the course: Complete Guide to Evaluating Large Language Models (LLMs)

Unlock this course with a free trial

Join today to access over 25,500 courses taught by industry experts.

Evaluating classification tasks

Evaluating classification tasks

- We're at the end of our LLM evaluation tree. We talked about multiple choice, free text response, embeddings, and trust me, we have case studies galore coming up, but let's now focus our attention on our final LLM task, classification, arguably one of the hardest things to evaluate, and at the same time, one of the easiest, you'll see why. Classification can generally come down to two types. You either have a fine-tuned classification model, meaning you take a pre-trained model off the shelf, and in this example I have, a BERT model, and you map its pre-trained output to a probability distribution over a predefined set of options, and then through fine-tuning, you would use historical data to update the parameters of that model to better fit that distribution. So for example, if I wanted a BERT model to basically only say positive or negative given a piece of text, for a BERT model, I might pass in that raw piece of text into the BERT model, take one of the outputs of one of its…

Contents