Hamel Husain’s Post

New video with Mikyo King on automating (the painful bits) of evals with Claude Code. We give Claude Code the full AI engineering loop: pulling traces, error analysis, hypothesis generation, and experiment design. Some of it works surprisingly well. Some of it doesn't. Either way its worth paying attention to. https://lnkd.in/gMynG3uG

Using Claude Code with Eval Tools

https://www.youtube.com/

from building multi-model systems - traces and experiment design automated cleanly. error analysis and hypothesis generation hit a wall. model analyzing its own failures only finds the failures it already understands. needed different models to hypothesize - they catch patterns the first model never questions

Like
Reply

This was super fun and eye opening for me too. Thanks for having me! Listen to Hamel Husain folks - to use Claude Code effectively, you yourself need to know what "good" looks like. Look at the data, feel the user's pains. Don't vibe code the important things. Invest in your own knowledge, then figure out the autonomy.

Anna Liashenko I wonder how your academic group would find this

Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories