Streamlining Cortex Agent Evaluations with Cortex Code

Today I used Cortex Code to run evaluations on my Cortex Agent and the workflow is surprisingly smooth. Cortex Code can create eval datasets, configure metrics, kick off evaluation runs, and analyze results -- all from the CLI. No context-switching between Snowsight tabs. You describe what you want to test, it writes the SQL  and Python, and you iterate from there. I was able to compare accuracy and groundedness across different prompt configs in about 15 minutes. The traces show exactly where the agent went off track, which makes debugging way faster than staring at raw logs. If you're building Cortex Agents, pair them with Cortex Code for evals. Works just as well for production monitoring as it does for continual testing as you tweak prompts and configs. What's your workflow for evaluating agents as they evolve?  #Snowflake #CortexCode #CortexAI #AIAgents #AIObservability

  • text

A little bit different approach, but related to same goal: I used coco to parse logs from co-agent, and then ai complete it with response accuracy: https://www.youtube.com/watch?v=iOC2gmlhuXc&t=7s

To view or add a comment, sign in

Explore content categories