Beyond Code Coverage: Confidence over Assurance

3,578 followers

Stop hiding behind 90% code coverage. We’ve all been there. The dashboard is green. The PR is merged. The coverage report says you’re safe. Then, a user does something *unexpected*… and production crashes. Here’s the hard truth: Code coverage tells you which lines ran — not whether your business logic actually works in the real world. You can have 100% coverage and still ship a broken product. At BaseRock AI, we believe in **Confidence over Coverage**. It’s time to move beyond “Did the line run?” → to → “Does the scenario actually work?” #SoftwareTesting #BaserockAI #BUCT #EngineeringExcellence #QualityAssurance

To view or add a comment, sign in

More Relevant Posts

Google for Developers

3,964,664 followers
1w
Report this post
Building an AI agent that works on your local machine is the easy part. Building one that handles rate limits, scales beyond hardcoded data, and avoids "token burn" is where most developers struggle. In the first AI Agent Clinic episode, Luis Sala and Jacob Badish took a brittle sales research agent ("Titanium") and rebuilt it from the ground up. Here are 4 engineering lessons from the refactor: 🔹 Ditch the monolith: Use orchestrated sub-agents to handle specialized tasks. 🔹 Force structured outputs: Use Pydantic schemas to ensure your model's response doesn't break your code. 🔹 Dynamic RAG over hardcoding: Replace static context with a scalable Vector Search pipeline. 🔹 Observability is vital: Use OpenTelemetry to see exactly where an agentic loop is failing. Read the full breakdown and watch the episode here: https://goo.gle/4mJfSWt #AIAgents #SoftwareEngineering #GenerativeAI
7 Comments
Like Comment
To view or add a comment, sign in
Adi Ghiuro

Building the AI Brain for Enterprise | Co-Founder at Mojar AI | Autonomous Agents, Smart Workflows & Instant AI-Powered Answers
2w Edited
Report this post
At some point, a knowledge base stops being just retrieval. When we built Instant Answers in Mojar AI, the idea was straightforward: connect it to your internal docs, let it answer questions, cite sources. RAG, basically. But we kept watching what happened after a few months of real usage. The answered questions were accumulating. The corrections, the thumbs up/down, the follow-ups. And somewhere between 500 and 1000 quality interactions, something shifts. You don't just have a knowledge base anymore. You have signal. Labeled, domain-specific, real-world signal about how your team thinks, what they flag as wrong, where the gaps are. That's when fine-tuning stops being a nice-to-have. Not to replace the knowledge base... RAG stays the spine. But to start shaping how the model reasons inside your world. Your language. Your edge cases. Your standards. The setup we're building toward: live KB grounding, fine-tuned model behavior, agent orchestration, human review, evals. Each layer doing something the others can't. Not a general model that kind of knows your domain. A system that was literally trained on how your team operates. Still early. But this is where "AI for your company" starts to mean something real. #EnterpriseAI #RAG #FineTuning #AIagents #AgenticAI #KnowledgeBase #LLMOps #AIForBusiness

5 Comments
Like Comment
To view or add a comment, sign in
Nick Wishner
3w Edited
Report this post
The math doesn't add up anymore: Human-speed testing + AI-speed development = A Quality Gap. We’ve seen the breaking point—fragmented tools, disconnected data, and AI agents operating without context. So, we built the solution. Not just a new tool, but a new reality for the software quality toolchain. The reveal happens April 7th. Are you ready? 👇 Register below to be the first to see it. https://lnkd.in/ezrp2xAZ] #Innovation #TestAutomation #AI #TechLaunch Alex Martins | Gokul Sridharan | Kevin Foster | Disha Gosalia| Derek Downs | Florence Trang Le | Vu Lam | Mush Honda | Coty Rosenblath | Rajesh Gopala Krishnan | Cristiano Caetano | Ritwik Wadhwa | Srihari Manoharan | Tejaswini Parmar | Jarred Bales | David Olejnik | Vaughn Rachal | Daisy Hoang, M.S.
Like Comment
To view or add a comment, sign in
Matt Nelson
1w
Report this post
AI agents are writing code, triaging incidents, and deploying infrastructure. At machine speed. Most teams have no way to see what those agents actually did; the steps, the tool calls, the decisions that led to that production incident at 2 a.m. This isn't a tooling gap. It's a visibility gap. And it's the most important problem in software engineering right now. We've spent a long time thinking about what it means to truly understand your systems. What we're building next is designed for exactly this moment. More to come. Stay tuned. #observability #AI #agentobservability #honeycomb

3 Comments
Like Comment
To view or add a comment, sign in
Dan Kennedy
1w
Report this post
Some of the most impactful software capabilities I’ve ever seen are about to ship… business and technology operators alike have needed this since the days of punch cards and steno pools. I’m beyond excited to see it all coming together!

Matt Nelson
1w

AI agents are writing code, triaging incidents, and deploying infrastructure. At machine speed. Most teams have no way to see what those agents actually did; the steps, the tool calls, the decisions that led to that production incident at 2 a.m. This isn't a tooling gap. It's a visibility gap. And it's the most important problem in software engineering right now. We've spent a long time thinking about what it means to truly understand your systems. What we're building next is designed for exactly this moment. More to come. Stay tuned. #observability #AI #agentobservability #honeycomb
Like Comment
To view or add a comment, sign in
Flowstart

114 followers
1w
Report this post
Uploaded a doc. 3 seconds later, the sheet updated itself. A client was burning 6 hours a week logging docs by hand. We built a silent worker that reads every file the moment it lands and writes the summary to a spreadsheet with the uploader's name attached. The win isn't the AI. It's deleting a task nobody wanted to do. What's the one job everyone on your team avoids? Drop it below. #AIAutomation #Productivity #WorkflowAutomation #AIAgents #Flowstart
Like Comment
To view or add a comment, sign in
Mahesh N.
4w Edited
Report this post
Just finished using Weights & Biases; it helped reduce the development effort to setup AI Infra by 30% since it provides Infra + eval + RL in one stack, giving my team expanded set of tools to build, train, and deploy production-grade AI agents W&B Weave automatically tracks every LLM call using the @weave.op decorator, capturing inputs, outputs, costs, latency, and evaluation metrics without manual setup #CoreWeave
Like Comment
To view or add a comment, sign in
Polina Loktionova
1w
Report this post
We had a simple question: 🤔 If someone looks up your company online… what do they actually learn about your AI story? Not what’s in your roadmap. Not what’s in internal docs. Just what’s visible from the outside. So our team at First Line Software built a small experiment — a 15-minute AI maturity check based only on public signals. You just enter a company name and see what shows up. If you’re curious, try it: https://lnkd.in/evYN6Yks
Like Comment
To view or add a comment, sign in
Max Petrusenko
4w
Report this post
Every production system has that one integration — the one that passes every test, works perfectly in staging, and then finds creative new ways to break in prod. For me it's always been webhooks. Timeouts, retries hammering downstream services, payload schemas drifting without notice. The gap between "it works on my machine" and "it works at scale" is where the real engineering lives. What's the integration in your stack that taught you the most painful lessons? #BuildInPublic #AI #GauntletAI #SoftwareEngineering #TechCareers
Like Comment
To view or add a comment, sign in
Shivashish Jaishy

Founder | CEO @ Shristyverse
4w
Report this post
The most honest product metric isn’t a dashboard. It’s frustration. . A small detail surfaced from the Claude Code client: - a pattern that flags prompts like “wtf”, “this sucks”, “so frustrating”. Not for response shaping. - For telemetry. . If your system isn’t measuring frustration, you’re blind to your most important failures. . The next generation of AI products won’t just be intelligent. They’ll be emotionally aware systems engineered from telemetry up. #CaudeCodeLEAKED
Like Comment
To view or add a comment, sign in

3,578 followers

View Profile Connect

Beyond Code Coverage: Confidence over Assurance

More Relevant Posts

Explore related topics

Explore content categories