Evaluating GenAI Platforms with Scoring Framework

I’m checking this out. As we need to select the “best” platform for our genAI applications, the permutations become truly daunting. Adrian provides a framework and code to do so - scoring and promoting the highest rated candidates. Fascinating!

I created a new repo/tool today to evaluate and collect the rapidly changing tooling configurations that everyone is trying to figure out (using statistical experimental design) I used Claude/Gastown to both make it and operate it and have some initial comparison data on opus/sonnet and Python/TS/Go etc. for a small test. I’d be happy for some github stars if people think it could be useful. https://lnkd.in/gHYmbUXj - Edit: a few more hours on Monday and it’s coming along well. Interactive html dashboards, six languages and a small and large application. (Spoiler, Go wins the over all comparison…)

To view or add a comment, sign in

Explore content categories