Coding agents introduced an existential problem for dev tool products. Here's how we're addressing it.
A few weeks ago, Scalekit showed up in devtool arena's top-10 changelog. Same tier as Stripe , Firecrawl , Tavily , Datadog , PayPal , You.com .
We didn't pitch for it. Developers were reaching for us inside real agent workflows. That was the signal.
But here's the more interesting question: what problem were we solving that got us there? Because the work that preceded that mention wasn't a marketing campaign or a launch. It was us grappling with something we think is the defining product challenge for developer tools right now.
Coding agents have introduced an existential problem for devtool products. And this is our attempt at addressing it.
What changed and why it matters more than it looks
Scalekit handles auth infrastructure for agents & SaaS apps — OAuth flows for agents, MCP server auth, the token handling and refresh logic that agents need to act on behalf of users.
For most of our existence, developer experience meant three things: clean SDK, great docs, fast onboarding. We obsessed over these. They worked — for developers who came to us deliberately, evaluated the options, and made considered choices.
Then the way developers build started shifting. They'd open Claude Code or Cursor, type something like "add auth to my agent so it can call external APIs on behalf of the user", and let the agent drive. No SDK evaluation. No doc-reading. Prompt, run, ship.
The agent then makes decisions the developer used to make. What library to use. How to structure the OAuth flow. How to handle token persistence. And it makes those decisions based on what it already knows — its training data, the repos it has absorbed, the patterns it has seen most.
For a lot of those decisions, Scalekit wasn't in the picture. Not because our product was difficult to use. Because we hadn't done the work to get there.
The problem has two parts and most teams conflate them
Once we named what we were seeing, we realised we'd been treating it as one problem when it's actually two. Conflating them leads to solving for one while the other quietly breaks you.
Problem 1: Agent discovery. When a developer prompts Claude Code to add OAuth auth to an agent, does Scalekit surface as a candidate? Not in a Google result they'll look up later — in the moment the agent is deciding what to reach for.
Problem 2: Agent implementation. When the agent does reach for Scalekit — or the developer explicitly asks for it — can the agent complete the integration correctly, without spiralling into correction loops?
Thanks Jun, sampleapp.ai for the feature.
Different root causes. Different solutions. Here's where we are on both.
1. Discovery: how agents decide, and where we're at
LLMs build trust and recommend much more like you do.
Their intuition is their training data. They spare a chance for authoritative places like how you pay attention to someone you look up to. They trust when same message comes from different signals like how you take it seriously when multiple friends speak about same viral news.
The clearest illustration: ask Claude Code to set up a database for a new project. It reaches for Supabase almost reflexively. That's years of GitHub presence, community signal, and documentation showing up everywhere, absorbed by the model as a default. Their MCP server pushed this further: now the agent doesn't just recommend Supabase, it executes it. Different kind of distribution entirely.
For agent auth, we weren't that default. So we started building toward it.
What that looks like in practice: targeted contributions to repos and documentation hubs where developers hit agent auth problems — adjacent tooling directories, open source libraries in the OAuth and agent space. The discipline was relevance over volume. A PR to a repo that has nothing to do with agent auth is noise. A PR that ends up in the reading path of someone with exactly our problem compounds differently. Our acceptance rate reflected this.
We also treat discoverability as an ongoing audit, not a project with a completion date. We regularly prompt Claude, Perplexity, and ChatGPT with the exact questions our customers ask — "how do I let my AI agent call APIs on behalf of a user", "how do I audit my what my agent did", "how do I secure MCP server auth" — and track what surfaces. Every gap was starting to become a distribution bug.
Recommended by LinkedIn
Honest assessment: we're early. This is a long-horizon bet. The tools building ecosystem presence now will be the defaults in 18 months, but we can't declare that work done. It compounds slowly and then all at once.
2. Implementation: what we actually built, and why generic wasn't enough
Every developer started bringing coding agent who they brainstorm, consult and discuss before they start approve of any code. While they steer the direction everything else is assisted to them by coding agents.
It's no longer enough to ship for the developer. The onus is on us to ship something for coding agent that developers are beginning to partner with. These coding agents are comfortable to execute CLIs and tools . That's why every serious devtool has a CLI or a plugin now. Stripe, Supabase, Resend — table stakes. The question isn't whether to build one. It's whether you've built it for the right user, even if that's an agent.
We hadn't.
Our SDK was designed for a developer who reads documentation and fills in gaps with context and judgment. An LLM filling in those same gaps struggles to choose the right judgement to help developers steer in right direction — more often than not it drifts. Every correction loop compounds that drift until developers try your competitor.
We saw this directly. In internal sessions where developers tried to integrate Scalekit using Claude Code without guidance, the agent wasn't getting things wrong exactly — it was gspending more time to construct correct hypothesis. It would attempt the auth flow for a user but misconfigure the token exchange. It would handle the initial auth correctly, then stall on the refresh logic because it was inferring the expected response shape rather than knowing it. The loops weren't caused by a bad model. They were caused by a surface area that was built for human readers.
The fix wasn't better docs. It was a different kind of product surface.
We built the Scalekit AuthStack for Claude Code → github.com/scalekit-inc/claude-code-authstack
What makes it different from a standard devtool plugin:
Usecase scoped plugins, not a generic API wrapper. The agent doesn't have to reason about what Scalekit does in the abstract. It invokes a named plugin within relevant scope. Want to try?
# Step 1 - In you Claude REPL
/plugin marketplace add scalekit-inc/claude-code-authstack
# Step 2 options: full-stack-auth, agent-auth, mcp-auth, modular-sso, modular-scim
/plugin install agent-auth@scalekit-auth-stack
Each plugin carries skills, subagents, MCP servers, references and hooks. All working together to not only guide you, but also let coding agent act on behalf of the developer.
For example, a dry run capability within the plugin that validates flows before touching production. Agents don't have a developer's instinct to test carefully before going live. So we built that guardrail into the tool. Validate your agent auth configuration against a real identity provider before deployment, without side effects.
The design principle: tight surface area reduces drift. Every degree of freedom you give an agent is a degree of freedom it can get wrong. Named tools with bounded scope, predictable response shapes, and errors that tell the agent exactly what to do next — that's what cuts the correction loop count.
We've shipped this for Claude Code, Codex, Cursor and GitHub Copilot For those coding agents developers love, but plugin ecosystem doesn't exist we packaged them into skills.
Just run and install them globally,
npx skills add scalekit-inc/skills --skill setup-scalekit
and prompt your coding agent, "Help me setup scalekit".
What we think this points to
Three things we'd push back on based on what we've learned:
The devtool arena mention happened because we'd made real progress on both. There's a lot of work left but the direction is clear. We'd rather be solving this now than figuring out it was the problem two years from now.
If you're working through either of these or you've found sharper approaches; we'd genuinely like to compare notes.
Strong signal as being listed alongside names like Stripe and PayPal shows you’re on the right track. The shift in how dev tools are discovered is real.