Using Claude Code with Eval Tools

Hamel Husain’s Post

1mo

New video with Mikyo King on automating (the painful bits) of evals with Claude Code. We give Claude Code the full AI engineering loop: pulling traces, error analysis, hypothesis generation, and experiment design. Some of it works surprisingly well. Some of it doesn't. Either way its worth paying attention to. https://lnkd.in/gMynG3uG

Using Claude Code with Eval Tools

https://www.youtube.com/

3 Comments

Barada Sahu 1mo

from building multi-model systems - traces and experiment design automated cleanly. error analysis and hypothesis generation hit a wall. model analyzing its own failures only finds the failures it already understands. needed different models to hypothesize - they catch patterns the first model never questions

Mikyo King 1mo

This was super fun and eye opening for me too. Thanks for having me! Listen to Hamel Husain folks - to use Claude Code effectively, you yourself need to know what "good" looks like. Look at the data, feel the user's pains. Don't vibe code the important things. Invest in your own knowledge, then figure out the autonomy.

3 Reactions

Igor Kasianenko 1mo

Anna Liashenko I wonder how your academic group would find this

See more comments

To view or add a comment, sign in

Hamel Husain’s Post

Using Claude Code with Eval Tools

https://www.youtube.com/

More from this author

What's the best approach for generating synthetic data for evals?

What gaps in AI eval tooling should I be prepared to fill myself?

Links to slides & recordings from our AI Eval lightning lessons

Explore content categories