Machine unlearning

Machine unlearning

Machine learning, machine learning, machine learning. We’ve spent the last few years obsessing over what AI can learn. What about machine UNlearning?

Machine learning and AI have exploded in our massive current LLMs. That’s meant bigger models. Trillion-plus parameters. More data, more compute, more capabilities … more more, essentially.

But there’s an uncomfortable question to be asked: what happens when AI learns something it shouldn’t? As in, lies … hallucinations … bias … or just plain old common errors.

That was the focus of a recent conversation on TechFirst with Ben Luria from Hirundo, where they’ve build something called machine unlearning. Turns out, this just might be a concept that will become one of the most important layers of the AI stack.

Check out our convo on YouTube:

Article content
click to watch our convo on YouTube ... LinkedIn doesn't allow video embedding :-(

Because here’s the reality:

AI systems today are remarkably good at learning.

They are almost incapable of forgetting.

We’re not that dissimilar, in a way

Remember the pink elephant thing? As in: don’t think of a pink elephant?

That pretty much guarantees you’ll think of a pink elephant. And large language models are similar. Once data is embedded into the weights of a neural network, it’s entangled. It’s distributed. It’s baked in.

That creates three major risks:

• Personal data (PII) accidentally included in training

• Bias against demographic groups

• Vulnerabilities, like susceptibility to prompt injection

• Behavioral issues, such as hallucinations

Once these are inside the model, traditional fixes don’t actually remove them. They just try to block them, put guardrails around them … essentially try to mitigate issues rather than solve them.

Guardrails are band-aids

Today’s dominant safety strategy is guardrails. They re perimeter defense: filters on input, filters on output, system prompts layered on top.

They are necessary, but they are not sufficient.

As Luria put it bluntly:

Most current solutions are band-aids, not surgery.

Guardrails don’t change the model’s internal behavior. They attempt to prevent bad outputs from surfacing. But if the underlying model is biased, vulnerable, or prone to hallucinate, that tendency still exists. Anything that slips past the guardrails hits the raw model underneath.

And guess what: something always slips past.

The real technical challenge: machine unlearning

The real challenge is unlearning: removing bad input from the LLM so you reduce bad outputs. That sounds easy, but the reason unlearning hasn’t already become standard isn’t that researchers haven’t thought about it. It’s because — you guess it — it’s extraordinarily difficult.

Removing behavior from an LLM is not like deleting a file. You risk:

  • Breaking instruction-following
  • Reducing performance on benchmarks
  • Overcorrecting and stripping useful capabilities

Luria described the process as three steps:

  1. Detection Identify where unwanted traits live in the model (at weight, neuron, or vector level).
  2. Isolation Separate those traits from adjacent useful capabilities.
  3. Remediation Edit or steer the model away from those directions.

The key challenge is fixing the problem without degrading performance. Hirundo says they’re freakishly good at it, though. In some reported cases, thanks to removing bad information, they’ve enabled a 50% reduction in hallucinations, an 85% reduction in prompt injection attacks, plus a significant reduction in racial or other bias.

That’s the promise. If it holds up broadly, it’s significant.

Right now, AI stacks typically look like this

  • Pretraining
  • Fine-tuning
  • Reinforcement learning
  • Guardrails
  • Monitoring

There is no dedicated “remediation layer.” If unlearning matures, it could become as standard as fine-tuning. And when we look back at this moment, we may think it was strange that we built systems capable of learning anything … but gave them no structured way to forget.

If we want AI to be deployable at scale AND make it safe, legal, ethical, and accurate … the ability to unlearn may become almost as important as the ability to learn.

Which means the next big breakthrough in a major model might not be EVEN MORE PARAMETERs but taking something away.

Interesting ... Don't you also see in machines unlearning a way to have optimized models, requiring less computing power GPUs etc.?

To view or add a comment, sign in

More articles by John Koetsier

Others also viewed

Explore content categories