Spent the morning shipping a production-grade security hardening patch for Claw Code Beta — here's what it took. What started as a 5-file patch turned into a full architectural overhaul. Here's what we built and verified: What changed: Centralised permission enforcement — all built-in tools, plugin tools, and runtime/MCP tools now flow through one enforcement path. No more bypass gaps. Workspace-safe file operations — every write, edit, and notebook mutation is boundary-checked against the active workspace before execution. Canonical path resolution, not string prefix matching. Prompt-mode hardened — out-of-bounds writes are now rejected immediately before confirmation is even surfaced. Fail-closed by design. Full monolith split — main.rs went from 5,400+ lines to 1,564. lib.rs from a giant file to 470 lines, with catalog.rs, dispatch.rs, registry.rs, and cli_tools.rs carrying focused responsibilities. Flaky MCP timing test replaced with a deterministic mock. Property tests added for path normalisation and glob-boundary parsing. End-to-end approval-path test covering the full prompt → confirm → execute flow. All 6 root docs rewritten from scratch — README, CLAUDE, PHILOSOPHY, PARITY, ROADMAP, USAGE — accurate and consistent with the actual system. Verification gates — both green: ✅ cargo test --workspace — 697 tests, 0 failures ✅ cargo clippy --all-targets --all-features -- -D warnings — 0 warnings Score across 5 engineering dimensions: 88/100 with a clear path to 99. Production verdict from the reviewer: "Production-capable for internal use and trusted operator workflows." The most important lesson from this session: a passing test suite is not enough. Real production readiness means every tool path is mediated, defaults are safe, and the codebase can be maintained by someone who wasn't there when it was written. Still working toward 99. The remaining gaps are known, documented, and on the roadmap. #Rust #SystemsProgramming #OpenSource #ProductionEngineering #CodeQuality #ClawCode
Claw Code Beta Security Patch Released
More Relevant Posts
-
𝗪𝗵𝘆 𝗠𝗼𝘀𝘁 𝗗𝗲𝗯𝘂𝗴𝗴𝗶𝗻𝗴 𝗘𝗳𝗳𝗼𝗿𝘁𝘀 𝗙𝗮𝗶𝗹 ⸻ We spent hours debugging… and we were looking in the wrong place. ⸻ 💡 What usually happens: • Engineers debug symptoms, not root cause • Logs can be misleading • Assumptions waste time ⸻ 🧠 Core insight: 👉 Debugging is about asking the right question, not searching more logs ⸻ A recent incident reminded me of this. We had a failure in a QR code generation flow for a tenant. At first, everything looked correct: • feature was enabled • configuration seemed valid • another implementation had worked earlier So naturally, we assumed: 👉 the issue must be in the new implementation ⸻ We went deeper: • checked API flow • reviewed logs • compared implementations Everything pointed in one direction… But it was the wrong one. ⸻ The actual issue? 👉 A hidden tenant-level configuration. A fallback feature was configured directly for that tenant. So even after adding the correct implementation: • the system still picked the fallback • the expected flow never executed • and the error persisted ⸻ The tricky part? This wasn’t obvious in the code. It was buried in configuration. ⸻ That’s when the question changed: ❌ “Where is the bug in the code?” ✅ “What path is the system actually taking?” ⸻ And that’s where we found it. ⸻ 𝗗𝗲𝗯𝘂𝗴𝗴𝗶𝗻𝗴 𝗶𝘀𝗻’𝘁 𝗮𝗯𝗼𝘂𝘁 𝗿𝗲𝗮𝗱𝗶𝗻𝗴 𝗺𝗼𝗿𝗲 𝗹𝗼𝗴𝘀. 𝗜𝘁’𝘀 𝗮𝗯𝗼𝘂𝘁 𝗮𝘀𝗸𝗶𝗻𝗴 𝗯𝗲𝘁𝘁𝗲𝗿 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀. ⸻ Curious: 👉 What’s the most misleading bug you’ve debugged? ⸻ #SoftwareEngineering #Debugging #BackendEngineering #Microservices #EngineeringLessons
To view or add a comment, sign in
-
-
What Anthropic’s Claude Code leak reminded us about shipping software Anthropic said the Claude Code leak was caused by a release packaging mistake, not a breach. Reports said a source map was included in a published package which exposed a large portion of the internal source code. Anthropic also said no customer data or credentials were exposed. That is exactly why this incident matters. A release can succeed technically and still be the wrong release. Most teams review what goes into the repo. Many review the pull request carefully. Fewer give the same attention to the final package that gets built and published. That is where mistakes can slip through. A few practical reminders: • What is in the repo is not the same as what gets published. • A clean PR does not guarantee a clean release. • It is safer to define what is allowed to ship than to rely only on exclusion rules. • Release checks should catch debug artifacts before publish. Good engineering is not only about writing code that works. It is also about making it harder to ship the wrong thing. A useful reminder for all software engineers. #SoftwareEngineering #Security
To view or add a comment, sign in
-
-
I stopped trying to “fix bugs.” And started asking a different question: “Why did the system behave this way in the first place?” That shift changed everything. Because in real systems: • There’s rarely a single point of failure • Most issues aren’t obvious • And everything works… until it doesn’t What I’ve learned is this: The hardest production problems don’t come from broken code. They come from interactions between things that individually work fine. A retry here. A cache there. Parallel requests somewhere else. All good decisions. Until they meet under load. That’s when systems don’t crash — They slowly degrade, confuse you, and make you question everything. Fixing it isn’t about adding more code. It’s about removing complexity and understanding behavior at scale. That’s the part of engineering I enjoy the most. #SoftwareEngineering #SystemDesign #DistributedSystems #FullStack #CloudEngineering #EngineeringMindset
To view or add a comment, sign in
-
The riskiest parts of your codebase are the ones nobody questions anymore. Internal teams normalize what they live with. A workaround that was "temporary" two years ago becomes invisible. A deployment that needs manual steps every time stops registering as a risk. Architecture choices made under early constraints never get revisited. This is not a quality problem. It is a proximity problem. The more you know a system, the less you question it. Most internal reviews check code quality and maybe tests. A structured outside review goes further: security, license compliance, deployability, running costs, data-structure scalability, observability, performance. The gaps it finds are rarely in the code you are actively changing. They are in the parts you stopped examining. If your confidence in the system is based on how long it has been since something broke, that is assumption, not evidence. When was the last time someone outside your team reviewed the parts you stopped changing? #SoftwareEngineering #TechnicalDebt #ProductHardening
To view or add a comment, sign in
-
-
Three builds. One mistake. Now I have the rule. I kept launching subagents for everything i build because I thought more agents meant better output. It doesn't. My bug-fixing pipeline had three agents. Reproduce, debug, fix. Each handed to the next. Every handoff lost context. The fix agent was guessing. One question fixes this: does the intermediate work matter to you? If you need to see the process, keep it in your main thread. If you just need the result, delegate it. Code reviews work as subagents. The reviewer sees the diff fresh, no memory of how the code was written. Research too. Main thread gets the answer. The 40-file search stays invisible. The architecture never changed. The question did.
To view or add a comment, sign in
-
-
Estimated 3 days, delivered in 10 days. The task: push infrastructure changes across 12 active branches in rancher/charts (SUSE's Kubernetes chart repository). Doing it manually wasn't an option anymore, since this has been a recurrent task. So I built the tooling first: - Docker-based propagation engine that reads a YAML config - Spins up isolated containers per branch - Syncs the infrastructure - Opens PRs automatically. Halfway through, I found a legacy validation script that was half-broken. I could have skipped it. I didn't. That decision alone added days. I saw the potential to improve CI validation to avoid future bottlenecks during Minor updates. But the engine is now running in production. One command propagates infrastructure changes across all active branches — dev and release — with automated PRs and fail-fast error handling. Supply chain security hardening shipped across the entire repo in a single operation. How many hours of manual work will this save? I honestly don't know. But every future infra change won't require anyone to touch 12 or more branches by hand. Human error removed from the equation. Worth the extra week (final result): https://lnkd.in/dEJQ57JJ
To view or add a comment, sign in
-
500k lines of internal code went public overnight 🚨 Not because of a breach. Not because of an attacker. Because of a routine release. Yesterday, the full source code of Claude Code left the company through a routine release. The bundle held 500k lines, every internal module, every toggle for unfinished features, everything was reachable on the public registry. A single missed configuration, The result was immediate and total exposure. When things like these happen it's easy to point the finger at the human error or neglect. But that misses the point. Human error isn’t an edge case it’s a constant risk. Humans forget checkboxes. Humans skip steps under pressure. Memory feels reliable until it is not. The error did not lie with the engineer, the error was in the lack of a process that keeps things safe even when an engineer slips. This is where guardrails come in. A system that catches mistakes when humans forget to check a box. A system that raises warnings when something looks off, and blocks when the risk crosses a threshold , flags unexpected file size spikes. A system that takes over the repetitive validation work, so safety doesn’t depend on someone remembering the checklist. After an incident, the correct question is not "Who forgot?" The correct question is "Which safeguard was missing between the desk and the production server?" Anthropic stated openly that a human action triggered the leak and that new guardrails will follow. That is the proper stance, a review of mechanisms, not a hunt for blame. Takeaway : If your release process depends on someone “remembering,” it’s already broken. Humans will never be flawless. Invest in guardrails that catches mistakes when humans slip. #SoftwareEngineering #DevOps #ReleaseEngineering #BlamelessPostmortem #ClaudeCode
To view or add a comment, sign in
-
-
We've all shipped agent code at 3 AM, thinking we nailed it. Then a rogue memory state, a single lost context, brings the whole goddamn system down. You spend a day tracing a phantom. Disparue. That's a week lost on a single ghost. The latest report on agent state management confirms what I’ve been screaming: your agents aren't just functions. They're entities with memory, and that memory needs a backbone stronger than a JSON blob in RAM. We're talking persistent, fault-tolerant state, designed from day one. Like a distributed ledger, but for an agent's brain. This isn't about fancy new LLMs. It’s about the plumbing. The infrastructure. The unglamorous, absolutely critical engineering that lets you scale from a demo to 21 systemd services across two continents. Granular checkpoints, explicit state transitions – these aren't features, they're foundations. They cut debugging from days to minutes. They let us ship 167,642 lines of code in 10 days because we aren't spending half that time chasing phantom bugs from volatile agent memory. When I architect multi-agent councils, I design the state layer before the prompt. This isn't an afterthought. It's the first thing on the whiteboard. Every agent’s 'brain' – its current task, its context, its learned parameters – needs its own robust, versioned storage. If you don't build it this way, you're building a house on sand, hoping the wind doesn't blow. You think you're fast? Try rebuilding a complex agent workflow from scratch because its memory just… vanished. That’s not fast, that’s financially ruinous. Do we build agents with throwaway memory, or architect for production resilience?
To view or add a comment, sign in
-
-
That "temporary" fix you shipped last quarter is now a core dependency. It starts with an urgent bug. A quick patch is pushed to production with a comment: `// TODO: Refactor this`. The team agrees it's a temporary solution. The ticket for the proper fix is created, but it's immediately de-prioritized for new feature work. A few sprints later, another developer builds a new abstraction on top of your temporary code, unaware of its fragile foundation. The original context is lost. This is how technical debt metastasizes. The temporary fix wasn't just a static liability; it had a half-life. The longer it sat, the more it decayed, radiating complexity and risk into surrounding modules. What was once a simple surgical fix now requires a major refactoring project that touches multiple services. The most dangerous code isn't the obviously broken part. It's the temporary solution that works just well enough to be forgotten, but not well enough to be stable. Either schedule the real fix immediately or treat the "temporary" code as permanent and give it the tests and documentation it deserves. How does your team track and manage these "temporary" solutions before they become permanent problems? Let's connect — I share lessons from the engineering trenches regularly. #SoftwareEngineering #TechnicalDebt #SystemDesign
To view or add a comment, sign in
-
When a production issue happens, technical skill is not the first thing tested. Decision quality is. Last month, I faced a backend incident during a high-traffic period. The team had 2 options: • Ship a quick patch directly in a critical flow • Roll back, stabilize, and fix with safer validation The quick patch looked faster. But it also increased risk in a part of the system with many integrations. We chose to roll back first. What happened after that: • Incident impact was reduced quickly. • We had time to identify the real root cause. • The final fix was smaller, clearer, and safer to maintain. My main lesson: In pressure moments, good engineers don’t choose the fastest code change. They choose the option with the best risk/clarity trade-off. This is where architecture and communication work together. How do you usually decide under pressure: quick patch or rollback first? #SoftwareEngineering #BackendEngineering #SoftwareArchitecture #SystemDesign #EngineeringMindset #ScalableSystems #TechGrowth
To view or add a comment, sign in
-
More from this author
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
https://github.com/benchbrex-USA/Claw-Code-Beta