Let’s quantify until LLMs fully explain us!
Dairio Amodei, CEO Anthropic (source: Wikipedia)

Let’s quantify until LLMs fully explain us!

"We DO NOT UNDERSTAND HOW our own AI creations WORK" candidly says Dario Amodei , co-founder and CEO of Anthropic, producer of GenAI models among the very best. "LLMs are not an invention, they are a discovery" says Jeff Bezos when asked to define those same models. Yes, it may be an alarming statement from such a leader. But Dario’s statement must be really appreciated for its transparency when made by somebody having so much "skin in the game" (yes, a few billions of dollars!).

He further asserts that all current and past attempts to create the analogue of a highly precise and accurate MRI (Medical Resonance Imagery) that would fully reveal the inner workings of the brain of an AI model have not gone very far until now. Though, some recent breakthroughs start to raise the curtain. The consequence is then: "When a generative AI system does something, like summarize a financial document, we have no idea, at a specific or precise level, why it makes the choices it does—why it chooses certain words over others, or why it occasionally makes a mistake despite usually being accurate." to further quote Dario.

The intrinsically stochastic behavior of LLMs makes the situation much more complex for software companies: LLMs deliver massive value that is out of reach of classical (i.e. deterministic programming). They can’t pass over this added value for their customers. But at the same time, incorrect answers ("hallucinations" as they are often called) or actions (when GenAI is embarked in robots) can cause dramatic damages on assets, human beings, etc. if insufficient guardrails are in place.

 So, yes, many risks currently associated to LLMs are in fact caused by absence of interpretability of their results. And also, by absence of proper guardrails around those results until we can fully understand how they got generated to reach their conclusions. It is always key to remember constantly remember it in our daily GenAI-based activities that LLMs have always an answer due to their stochastic nature.

 So, the hard part is on us, those whose add GenAI to their system, until interpretability issue has been solved (it will!). We must find ways to quantify and validate LLM results until the LLM can explain us unequivocally how it got there. This movement is started with the last reasoning-based LLMs: they provide details on how they progress toward their response.

 Personally, I am somewhat on the "lucky" side: I use LLMs to generate source code and associated tests. So, the downstream steps in my workflows allow me to compile and run those generated tests. Those steps are such solid validations of the LLMs results. But I also contribute to AWS-internal projects where results are in pure natural language. In that case, it is hard to develop totally solid mechanisms validating those results. Even recent techniques like "LLM as judge" have their limits and cannot deliver 100% guarantee.

 The best we can all, as GenAI practitioners, collectively do for now when we use LLMs is to return the quality of results (through metrics) when we deliver results. Then, downstream steps in our workflows (or those of our customers) know how confident we are. They are informed of how far they can trust those results. It is a fully responsible behavior to return those metrics as computed even if, in some cases, they are lower than desired. They will build trust by users / customers. Transparency never hurts on the long term!

 This uncertain situation is probably transient: Dario explains the various techniques developed by Anthropic and research community to identify 30+ million "features" (think brain zones with their skills) in a model like Claude Sonnet (believed by Anthropic teams to contain 1+ billion of them). Read his full essay to get all details and learn more about "Jennifer Aniston Neuron" and "Golden Gate Claude"!

 The final section "What We Can Do" is a holistic call to all stakeholders: scientists, technologists, legal entities, policy makers, etc. It suggests many ways to cope with the current situation.

 We don't want to pause our use of GenAI until it gets fully interpretable, hence safer. We want to collectively act, each of us, at our lower or higher levels to use GenAI responsibly to avoid the catastrophes that would jeopardize its future.

 I didn't say it before but this essay "The Urgency of Interpretability" is definitely a must read if you're interested in the state of the art for LLMs

So, let’s quantify until LLMs fully explain!

All comments and opinions welcome!

To view or add a comment, sign in

More articles by Didier Durand

Others also viewed

Explore content categories