98M Tokens a Week at 60-80% Cost Reduction with Advisor-Driven Development

I used 98 million tokens last week. Not a typo. And no, I'm not burning money. I engineered the cost floor down low enough to actually operate at that scale. Here's the pattern: Most Claude Code sessions are 80% mechanical work. Reading files. Writing boilerplate. Simple edits. Reformatting. You're paying Opus rates for tasks Haiku could handle without breaking a sweat. So I built a dispatch system that stops doing that. Opus stays in the orchestrator seat. It plans, makes architectural decisions, and reviews output. Everything mechanical gets routed to the cheapest capable model for the job: Haiku, Sonnet, Gemini, Kimi, or local Ollama models running free on-device like Gemma 4. The result: 60-80% cost reduction on multi-step tasks. Which means I can run at 98M tokens a week and still have it make sense as a business decision. The pattern is called Advisor-Driven Development. Two dispatch paths: 1. Agent tool Claude subagents with full file access on cheaper models 2. ask.py A CLI that routes to any provider for text generation, code, even video analysis Opus tokens go to thinking. Everything else goes where it's cheapest to send it. I've open sourced the whole thing under Pushing Squares. Repo below includes: ask.py, a ready-to-paste CLAUDE.md snippet, and a slash command for structured multi-task execution. Pay for intelligence. Not for mechanical work. // —ARI— https://lnkd.in/e_nzMQ2i

To view or add a comment, sign in

Explore content categories