From the course: LLM Evaluations and Grounding Techniques
Unlock this course with a free trial
Join today to access over 25,500 courses taught by industry experts.
Real-world LLM user testing
From the course: LLM Evaluations and Grounding Techniques
Real-world LLM user testing
- [Instructor] So far, we've been improving our LLM performance within a test environment, but the real world is different from test environments. Users do all kinds of weird things. So it's important to focus on user testing before releasing your application to production. Now, we're going to see this in action by testing out a trivia host agent. Let's open up Voiceflow and import 04_08. I have this file on the GitHub branch. As you can see here, we've built out a little agent that plays trivia with us. It's going to ask us a question, generate a random trivia topic, and then generate a question based on the Knowledge Base. Afterwards, it's going to capture a response, judge the response, and then determine what score should give us. Pretty straightforward. Now, before we get started, we have an empty Knowledge Base. Let's go ahead and populate it. I'm going to hit Back and go to Integrations. I'm going to copy the API key. Next, I'm going to head over to GitHub Codespaces. And right…
Contents
-
-
-
-
-
-
(Locked)
Creating LLM evaluation pipelines5m 7s
-
(Locked)
LLM self-assessment pipelines7m 22s
-
(Locked)
Human-in-the-loop systems6m 13s
-
(Locked)
Specialized models for hallucination detection9m 1s
-
(Locked)
Building an evaluation dataset4m 53s
-
(Locked)
Optimizing prompts with DSPY15m 27s
-
(Locked)
Optimizing hallucination detections with DSPY6m 11s
-
(Locked)
Real-world LLM user testing6m 38s
-
(Locked)
Challenge: A more well-rounded AI trivia agent43s
-
(Locked)
Solution: A more well-rounded AI trivia agent3m 32s
-
(Locked)
-
-