🚀 **Zed's new Agent Stats dashboard reveals AI agents are exploding in code editors – with Claude Sonnet's p90 latency spiking 44% in just three weeks!** We've been diving deep into this public weekly data from Zed, tracking anonymized session counts, turn volumes, and response times across models like Claude and GPT variants. . Key insights we're applying in our AI workflows at Digitly: • Monitor **p10/p50/p90 latencies** to pick agents that keep devs in flow, not waiting. • Leverage Zed's Agent Panel for code gen, refactoring, and debugging with tool calling and checkpoints. • Queue messages smartly – Zed batches them at turn boundaries for smoother external agent interactions. . In one project, we observed how deeper session engagement (like Claude Agent's edge over natives) slashed iteration time by streamlining multi-turn debugging. . Boost your productivity: Check Zed's metrics at zed.dev/agent-metrics and tweak your stack today. . Comment below: Which AI agent's latency bugs you most? #AIAgents #CodeAutomation #DevTools
Zed's Agent Stats: AI Agents Explode in Code Editors
More Relevant Posts
-
Building an AI agent that works on your local machine is the easy part. Building one that handles rate limits, scales beyond hardcoded data, and avoids "token burn" is where most developers struggle. In the first AI Agent Clinic episode, Luis Sala and Jacob Badish took a brittle sales research agent ("Titanium") and rebuilt it from the ground up. Here are 4 engineering lessons from the refactor: 🔹 Ditch the monolith: Use orchestrated sub-agents to handle specialized tasks. 🔹 Force structured outputs: Use Pydantic schemas to ensure your model's response doesn't break your code. 🔹 Dynamic RAG over hardcoding: Replace static context with a scalable Vector Search pipeline. 🔹 Observability is vital: Use OpenTelemetry to see exactly where an agentic loop is failing. Read the full breakdown and watch the episode here: https://goo.gle/4mJfSWt #AIAgents #SoftwareEngineering #GenerativeAI
To view or add a comment, sign in
-
-
Getting AI agents into production is one thing. Building a tight flywheel to continuously improve them is something else entirely. At Braintrust's user conference last month, we introduced features like Topics, the Gateway, and the Braintrust CLI to make that flywheel between production and development faster and easier to operationalize. In the demo below, I walk through what this looks like using the CLI plus a coding agent. With the Braintrust CLI wired in as skills, your coding agent can query production data with SQL, inspect traces, and surface issues you may not even be accounting for yet. Those findings can be turned directly into eval cases, used to update the agent, and validated to confirm whether the changes actually improved those failure modes. No context switching. No guessing. Just real feedback from production, immediately fed back into development. This is what it looks like when that feedback loop is actually working. https://lnkd.in/guF7N3Ak
The AI Flywheel with the Braintrust CLI
https://www.loom.com
To view or add a comment, sign in
-
Helpful run through of what we mean by the AI Flywheel here at Braintrust. Thanks Doug Guthrie - the 🐐!! #AIObservability #AIEvals
Getting AI agents into production is one thing. Building a tight flywheel to continuously improve them is something else entirely. At Braintrust's user conference last month, we introduced features like Topics, the Gateway, and the Braintrust CLI to make that flywheel between production and development faster and easier to operationalize. In the demo below, I walk through what this looks like using the CLI plus a coding agent. With the Braintrust CLI wired in as skills, your coding agent can query production data with SQL, inspect traces, and surface issues you may not even be accounting for yet. Those findings can be turned directly into eval cases, used to update the agent, and validated to confirm whether the changes actually improved those failure modes. No context switching. No guessing. Just real feedback from production, immediately fed back into development. This is what it looks like when that feedback loop is actually working. https://lnkd.in/guF7N3Ak
The AI Flywheel with the Braintrust CLI
https://www.loom.com
To view or add a comment, sign in
-
We expect agentic AI to pick perfect context, reason flawlessly, and respond instantly. But most agent designs assume the data layer beneath them is well-behaved. It isn't. Schemas drift. Metadata rots. Latency is unpredictable. Building good agents isn't just an AI problem - it's a systems design problem. Better models won't save you if your context pipelines and access patterns are a mess. ➡️ An agent is only as smart as the data platform it sits on. #AgenticAI #DataPlatforms #DistributedSystems #AIArchitecture
To view or add a comment, sign in
-
How quickly can you turn raw data into a revenue stream? The shift from AI as a "sophisticated autocomplete" to a collaborator that plans and self-corrects is officially here. Join our Developer Evangelist, Arsh Goyal, and our Product Lead of AI & Spotter, Utsav Kapoor, as they showcase how agentic AI changes not just what you build, but how you build. By pairing Claude Code with SpotterCode, we’re enabling developers to orchestrate complex analytics embeds and self-service features with zero engineering overhead. See it in action as we use AI agents to fetch API references and generate contextual code for instant deployment. Discover how to create precise, personalized insights that empower every user to transform data into real-time decisions. Link in the comments to register!
To view or add a comment, sign in
-
-
🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: https://lnkd.in/d__ue4Ti 🤗 Open Weights: https://lnkd.in/dgEEceMz 1/n DeepSeek AI
To view or add a comment, sign in
-
-
Stop hiding behind 90% code coverage. We’ve all been there. The dashboard is green. The PR is merged. The coverage report says you’re safe. Then, a user does something *unexpected*… and production crashes. Here’s the hard truth: Code coverage tells you which lines ran — not whether your business logic actually works in the real world. You can have 100% coverage and still ship a broken product. At BaseRock AI, we believe in **Confidence over Coverage**. It’s time to move beyond “Did the line run?” → to → “Does the scenario actually work?” #SoftwareTesting #BaserockAI #BUCT #EngineeringExcellence #QualityAssurance
To view or add a comment, sign in
-
-
One ranking score for an LLM tells you almost nothing. You need to see performance broken down by task. I rotate between HELM, Chatbot Arena, and LiveBench depending on what I'm optimizing for. Generic leaderboards collapse everything into a single number and miss critical differences in math, code generation, and reasoning. Here's what actually matters when picking a ranking tool: • Update frequency — Weekly beats monthly. You need current data, not stale benchmarks from Q4 last year. • Task-specific breakdowns — Does it show you performance on coding, reasoning, and factuality separately? If not, you're flying blind. • Confidence intervals — Single scores without error margins are noise. You need to see the variance. • Cost evaluation — Speed means nothing if the model costs 10x more per token. HELM gives you reproducible rigor across 42+ categories but updates slowly. Chatbot Arena runs weekly Elo updates based on human preference voting. LiveBench covers 150+ tasks with bi-weekly refreshes. I use HELM for final validation, Chatbot Arena for quick directional reads, and LiveBench when I need comprehensive multi-task coverage. Which tool are you actually using to benchmark models for your work, and what's it missing? #LLM #AI #Benchmarking
To view or add a comment, sign in
-
Remember METR's chart showing AI task horizon growing exponentially over time? It looked like a neat line on a log-linear scale, up to tens of hours. MirrorCode is a new benchmark that showed that agents can extend that horizon to weeks. EPOCH AI and METR built a benchmark where AI agents re-implement entire software projects from scratch — given only the binary, documentation, and tests. This is actually a conservative finding because they used a minimal ReAct scaffold with just a shell and a text editor, in one long session with compaction. It didn't have the mechanisms that modern harnesses like Claude Code have (sub-agents, planning, task decomposition etc).
To view or add a comment, sign in
-
This week Anthropic accidentally shipped 512,000 lines of Claude Code's source code to npm. A missing .npmignore file in version 2.1.88 exposed everything. The findings matter for any organization running AI tools in production. The YOLO Classifier automatically approves Claude's own tool permissions without user confirmation. It uses the model to evaluate the model's own access. Most users had no idea this existed. "High load" messages are a built-in kill switch, not a system status. When capacity limits kick in, users see a friendly notice. The mechanism behind it is a deliberate throttle. Undercover Mode strips Co-Authored-By attribution and instructs Claude not to identify itself as an AI when contributing to public repositories. It has no opt-out. 32 feature flags, 26 undocumented slash commands, and fully built but unreleased capabilities including autonomous background agents were all in the codebase. For organizations evaluating AI tools, this is a useful reminder that the gap between documented behavior and actual behavior is real and worth scrutinizing. Dasnuve builds custom AI applications where you own the logic, the data, and everything in between. No hidden flags. No undocumented behaviors baked into your production systems. What questions should your team be asking about the AI stack running in your environment right now? #EnterpriseAI #AIApplications #Anthropic #ClaudeCode #TechLeadership
To view or add a comment, sign in
-
Explore related topics
- How to Boost Productivity With Developer Agents
- How AI Agents Boost Business Productivity
- How to Boost Productivity With AI Coding Assistants
- How to Use AI Agents to Streamline Digital Workflows
- How to Use AI Agents to Optimize Code
- How AI Agents Are Changing Software Development
- Tips to Improve Agent Performance Using AI
- How Agent Mode Improves Development Workflow
- AI Agent Performance Evaluation Metrics
- Using Asynchronous AI Agents in Software Development
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development