Couple "Cost of AI" thoughts I'm working through: 1) I think we're going to see more and more of the "massively subsidized token" plans/setups come to an end as some of biggest AI labs move towards IPOs and profitability. You can already see this happening as session limits are curbed, enterprise pricing for the likes of Claude being API-based and _not_ subscription based. No more cheap uber rides for us! 2) I'm also seeing things like Capybara/Mythos step-change models circulate, and sounds like the cost to run is going to keep going up for providers (good if you're Nvidia). TurboQuant might help mitigate capacity to some extent. Given all that... I think a few things become important/ happen: 1) token efficiency becomes more and more important. I'm using things like RTK, Serena, and Claude Mem to offset this today. The tooling getting this right will become even more important going forward. 2) the Chinese-variant open(ish) models will be more and more appealing, even with tradeoffs. Companies will have to ask themselves "Do I want to drop $100k on Mythos 12 or 1/12 of that on Opus-knockoff-8". 3) if there _is_ continued downward token cost pressure, efforts to lock you in (Claude Code Review, terms for subscription plans, other features) will increase in frequency and impact https://lnkd.in/ez3b9aXr https://lnkd.in/eMvc4cwy https://lnkd.in/e-xiZgQe
AI Pricing Shifts: Token Efficiency and Cost Implications
More Relevant Posts
-
We just added Codex support to Pushguard. Pushguard is a pre-push AI code review tool that analyzes your changes before they hit your repo — catching bugs, performance issues, and cross-file problems using full codebase context. You can now run it with: – Claude Code – Codex Same flow, different engines. https://lnkd.in/gCVxZu8G
To view or add a comment, sign in
-
HOT TAKE: Claude Code has too much configuration for how most people actually work. I'm not saying it's bad, I love CC and use it everyday. But more configuration doesnt mean clearer workflow. Every new capability is a choice: adopt, ignore, or half adopt and get noisy sessions. I’d rather run a small stack I understand than the maximal install. Pi harness has been a bright spot lately: less menu tax, more time in the loop. The flex isn’t “I enabled everything.” It’s repeatable outcomes with less cognitive tax. (Pi.dev) Same energy behind this Friday rabbit hole(caveman) https://lnkd.in/dspNRz9a a one-line install to make your AI talk like a caveman and drop output tokens ~75%… #AITools #BuildingWithAI #ProductManagement
To view or add a comment, sign in
-
Deep Agents: Why Planning, Files, Todos, Sub‑Agents & Prompts Matter Building truly capable AI agents isn’t about a single clever prompt—it’s about architecture. Projects like Deep Agents from LangChain highlight five core building blocks that take agents from demos to production‑ready systems: 🧠 Planning Agents need explicit planning to break down complex goals, reason step‑by‑step, and adapt when things change—just like humans do. 📁 Files Persistent file access enables agents to store context, artifacts, logs, and intermediate outputs—critical for long‑running or multi‑step workflows. ✅ Todos Task tracking gives agents memory of what’s done and what’s next, improving reliability, resumability, and transparency. 🤖 Sub‑Agents Delegation is power. Specialized sub‑agents allow parallelism, separation of concerns, and cleaner reasoning—each agent focuses on what it does best. 📝 Prompts (as first‑class citizens) Well‑designed, reusable prompts define agent roles, boundaries, and decision‑making patterns—turning instructions into systems. Together, these components enable deep reasoning, autonomy, and scalability—exactly what’s needed to move from “chatbots” to real AI teammates. 🔗 Explore the project: https://lnkd.in/e_xFiyD6 If you’re building agentic systems, this repo is a must‑study. #AI #AgenticAI #LLM #LangChain #DeepAgents #SoftwareArchitecture #GenerativeAI
To view or add a comment, sign in
-
I didn't like the trend that "Frontier models" started encrypting their Chain of Thought, so I started to build an Open Source framework for Open Reasoning/Chain of Thought and supporting schemas. https://lnkd.in/g-KeAYq2 I think it's important that in the future, humans working with AI understand the reasoning behind their outputs. The last think we need is for probabilistic black boxes to be more opaque. The next thing I'm about to release is the inverse of the open-cot - a harness that can actually implement, navigate and thrive in the reasoning workflow with examples for agentic, langchain/langraph, coding and other reasoning & plan-do-act modes. This will allow harnesses to speak through reasoning, audit reasoning, track reasoning and set up budgets for reasoning. Ping me if this sounds interesting to :)
To view or add a comment, sign in
-
We've been heads down on Open Agent Spec. Time for an update! When we started OAS, the premise was simple: AI agents should be defined in YAML, like infrastructure-as-code. Declarative, version-controlled, engine-agnostic But a spec that takes a prompt and returns a string isn't an agent. It's a template. So we fixed that, Over the next few posts I'll break down what's new. Tool use, composable specs, a public registry, and a test harness that lets you CI your agents. All open source: https://lnkd.in/g7Tm5Q8x What's your take : should agent definitions live in code or in config? #AgentFrameworks #BuildInPublic
To view or add a comment, sign in
-
Built a small agent to stop burning tokens in Claude Code (and Cursor, Copilot, Codex). It's called lean-dev — one command sets up smart context management, auto-generates .claudeignore, tightens your CLAUDE.md, and switches models by task automatically. npx lean-dev init That's it. No config, no setup friction. Still early days — if you try it and find bugs, open a GitHub issue. PRs and ideas are very welcome too. https://lnkd.in/gQZwqVuz #ClaudeCode #AI #DeveloperTools #OpenSource
To view or add a comment, sign in
-
Sometimes the best optimization is just communicating like a caveman. The “Caveman” Token Hack I Used in April While working with AI agents, I noticed my tokens were getting burned too fast (especially with Claude Opus). So I tried a simple approach I call the “Caveman Skill”: Think simple before you prompt. Don’t over-explain Break tasks into small steps Ask one thing at a time Keep context minimal That’s it. Result: less token usage, cleaner outputs, better control. https://lnkd.in/g4sTaSNE
To view or add a comment, sign in
-
Okay, real talk: I thought Claude Code was just a fancier Copilot. Then I actually used it. This thing doesn't sit around waiting for instructions like an intern on their first day. It moves. Need it to dig through your codebase, run terminal commands, and edit files across your whole project at once? Done. You describe the goal; it maps the route. You're the GPS destination, not the driver. MCP servers are where your jaw drops a little. Plug in external tools, browsers, databases, and APIs, and Claude Code picks them up and uses them like it's always had them. It's not "AI plus tools bolted on." It's AI that actually has a toolbox. GitHub connectors mean it's not hiding in a tab somewhere while your real work happens elsewhere. It's in the PR. It's in the review. It's part of how the team ships, not a side quest. And then there are hooks, which honestly should be talked about way more. Imagine being able to whisper to Claude Code before it does anything: "Check this." "Always do that after," or "never touch this file." Enforce standards. Trigger tests. Build guardrails. It's your workflow, your rules. Claude Code just follows them. Four things. Tools, MCP servers, connectors, hooks. And suddenly you're not just using AI to code faster; you're using it to work smarter. There's a difference. A big one. 🙌 #ClaudeCode #Anthropic #AI #SoftwareDev #DevTools #Automation
To view or add a comment, sign in
-
Anthropic just moved enterprise customers to per-token billing. This changes things for documentation teams. Every token across Claude, Claude Code, and Cowork is now billed at API rates on top of the seat fee. OpenAI did the same with Codex. GitHub tightened Copilot limits. The flat-fee era for AI coding tools is ending. Under flat-fee pricing, a verbose API spec cost the same as a compact one. The tokens were consumed either way, but nobody saw them on a bill. Under per-token pricing, format becomes a line item. YAML describes the same API using ~80% fewer tokens than OpenAPI. That difference now shows up on your customers' invoices every time their AI tools read your docs.
To view or add a comment, sign in
-
Everyone on an Anthropic Enterprise plan needs to check out Ed Grzetich's research about the impact of file format on token usage. Because from what he has found, it matters A LOT.
Anthropic just moved enterprise customers to per-token billing. This changes things for documentation teams. Every token across Claude, Claude Code, and Cowork is now billed at API rates on top of the seat fee. OpenAI did the same with Codex. GitHub tightened Copilot limits. The flat-fee era for AI coding tools is ending. Under flat-fee pricing, a verbose API spec cost the same as a compact one. The tokens were consumed either way, but nobody saw them on a bill. Under per-token pricing, format becomes a line item. YAML describes the same API using ~80% fewer tokens than OpenAPI. That difference now shows up on your customers' invoices every time their AI tools read your docs.
To view or add a comment, sign in
Explore related topics
- Reasons AI Models Are Becoming More Affordable
- How to Reduce Generative AI Model Costs
- How AI Affects Startup Costs
- How AI Affects Agency Pricing
- Understanding AI Costs for Developers
- Importance of Usage-Based Pricing for AI
- Implications of AI Price Reductions
- Understanding AI Pricing and Its Effects on Users
- Reasons Companies Hesitate to Use AI Pricing
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
Perhaps one of the outcomes of this will be an orchestration layer that has built-in model fluidity. Use the Opus-knockoff-8 for the simple tasks, leverage Mythos 12 for the complex tasks. My brain hurts.