From the course: Exploring Deterministic LLM Programming

Unlock this course with a free trial

Join today to access over 25,500 courses taught by industry experts.

Princeton's Holistic Agent Leaderboard (HAL) framework

Princeton's Holistic Agent Leaderboard (HAL) framework

From the course: Exploring Deterministic LLM Programming

Princeton's Holistic Agent Leaderboard (HAL) framework

- [Narrator] HAL, Holistic Agent Leaderboard, is Princeton's cost-aware, third-party agent evaluation framework. Some of the things that it takes a look at are the fact that agents can cost 100 times more, but only be 1% better. And if we look at some of the different benchmarks, they reveal critical inefficiencies. Princeton has a third-party standardized evaluation, and this changes a lot of the ways that you look at the performance of agents. One of them is this cost performance crisis. So, for example, the best to worst performance gap would reach 70%, but a simple retry strategy could match the complex architecture at a fraction of the cost. So we call this over-engineering. There's potentially this thinking model that goes through and burns a lot of GPU. But it turns out that if you just ask the chat bot the same question again, you actually can get just as good performance for a fraction of the cost. If we look…

Contents