Debugging AI, One Edge Case at a Time

Maxime Grenu

Published Mar 26, 2026

Over the past few days, I’ve pushed and merged a few PRs across different repos. Nothing flashy — but the kind of work that actually matters when you try to run AI systems in production.

A few examples:

NemoClaw → added real-world deployment notes (hardware constraints, known issues, NIM warnings). Basically the stuff you wish you knew before things break
NemoClaw → fixed installer/uninstaller issues that were causing inconsistent setups
OpenShell → updated routing logic for GPT-5+ compatibility (max_completion_tokens)
Megatron-LM → fixed a Python shutdown crash in async calls (the kind of bug that only shows up at the worst time)
kvpress → improved decoding state handling for more predictable outputs

What I enjoy in this kind of work is that it sits right in the messy middle:

between GPUs, drivers, APIs, and actual usage in production.

And honestly, that’s where most problems are.

Not the model.

Not the theory.

But everything around it.

That’s also where I tend to focus:

making AI systems stable, reproducible, and usable outside of demos.

infra that doesn’t randomly break
deployments you can actually trust
systems that behave the same way twice

I’m based in Switzerland and currently open to senior roles around:

AI infrastructure, platform engineering, or anything where things need to actually work at scale.

If that’s what you’re building, happy to chat.

#AI #MLOps #Infrastructure #NVIDIA #LLM

Tech me if you can

504 followers

+ Subscribe

To view or add a comment, sign in

More articles by Maxime Grenu

We Were Told the Cloud Would Be Cheaper. AI May Bring the Data Centre Back.

Apr 29, 2026

We Were Told the Cloud Would Be Cheaper. AI May Bring the Data Centre Back.

Yesterday I attended with Mohamed Lamouri SC63, the Silicon Chalet meetup on FinOps and Security, hosted at Swissquote…
I Contributed to FastAPI — Here's What I Fixed and Why It Matters

Mar 9, 2026

I Contributed to FastAPI — Here's What I Fixed and Why It Matters

I recently had a pull request approved on FastAPI, one of the most popular Python web frameworks with over 80k stars on…
I Built a Plugin That Makes AI Agents Safer — Here's How

Mar 7, 2026

I Built a Plugin That Makes AI Agents Safer — Here's How

AI agents are getting more powerful every day. They can browse the web, write code, manage files, and execute complex…
zvec: Alibaba’s Embedded Vector Database That Changes How We Build RAG Systems

Feb 25, 2026

zvec: Alibaba’s Embedded Vector Database That Changes How We Build RAG Systems

In most AI architectures today, vector search is still treated as an external service. That means network latency…

1 Comment
SIPAC 2.0 : le plus grand fiasco de l’histoire de l’informatique publique suisse

Feb 3, 2026

SIPAC 2.0 : le plus grand fiasco de l’histoire de l’informatique publique suisse

https://www.rts.

6 Comments
Making Elasticsearch up to 12× Faster with NVIDIA cuVS and the NVIDIA AI Infrastructure Stack

Jan 19, 2026

Making Elasticsearch up to 12× Faster with NVIDIA cuVS and the NVIDIA AI Infrastructure Stack

Vector search has become the backbone of modern search, recommendation systems, and Retrieval-Augmented Generation…
How to pass NVIDIA NCP-AAI (Agentic AI) — study guide for a brand-new certification

Dec 16, 2025

How to pass NVIDIA NCP-AAI (Agentic AI) — study guide for a brand-new certification

A study guide for a brand-new certification NVIDIA NCP-AAI (Agentic AI) is a new certification. And when a…
From Silicon to Agents: Why I’m Targeting the NVIDIA NCP-AAI 🚀

Dec 8, 2025

From Silicon to Agents: Why I’m Targeting the NVIDIA NCP-AAI 🚀

For 25 years, I’ve chased the bleeding edge. My career has been defined by one obsession: mastering the critical…

3 Comments
From HPC Infrastructure to Drug Discovery: How the @Eli Lilly and Company × @NVIDIA × @WEKAIO Stack Redefines AI Factories

Oct 31, 2025

From HPC Infrastructure to Drug Discovery: How the @Eli Lilly and Company × @NVIDIA × @WEKAIO Stack Redefines AI Factories

Perspective from an embedded contractor at Eli Lilly & Company and an @NVIDIA-Certified AI Infrastructure Professional…
Oracle Goes All-In on AI: Building the Entire Stack

Oct 23, 2025

Oracle Goes All-In on AI: Building the Entire Stack

Oracle just unveiled a slate of massive AI updates – and unlike others focusing on a single niche, Oracle is betting on…

See all articles

Tech me if you can

504 followers

More articles by Maxime Grenu

We Were Told the Cloud Would Be Cheaper. AI May Bring the Data Centre Back.

I Contributed to FastAPI — Here's What I Fixed and Why It Matters

I Built a Plugin That Makes AI Agents Safer — Here's How

zvec: Alibaba’s Embedded Vector Database That Changes How We Build RAG Systems

SIPAC 2.0 : le plus grand fiasco de l’histoire de l’informatique publique suisse

Making Elasticsearch up to 12× Faster with NVIDIA cuVS and the NVIDIA AI Infrastructure Stack

How to pass NVIDIA NCP-AAI (Agentic AI) — study guide for a brand-new certification

From Silicon to Agents: Why I’m Targeting the NVIDIA NCP-AAI 🚀

From HPC Infrastructure to Drug Discovery: How the @Eli Lilly and Company × @NVIDIA × @WEKAIO Stack Redefines AI Factories

Oracle Goes All-In on AI: Building the Entire Stack

Explore content categories