Claw Code Beta Security Patch Released

Spent the morning shipping a production-grade security hardening patch for Claw Code Beta — here's what it took. What started as a 5-file patch turned into a full architectural overhaul. Here's what we built and verified: What changed: Centralised permission enforcement — all built-in tools, plugin tools, and runtime/MCP tools now flow through one enforcement path. No more bypass gaps. Workspace-safe file operations — every write, edit, and notebook mutation is boundary-checked against the active workspace before execution. Canonical path resolution, not string prefix matching. Prompt-mode hardened — out-of-bounds writes are now rejected immediately before confirmation is even surfaced. Fail-closed by design. Full monolith split — main.rs went from 5,400+ lines to 1,564. lib.rs from a giant file to 470 lines, with catalog.rs, dispatch.rs, registry.rs, and cli_tools.rs carrying focused responsibilities. Flaky MCP timing test replaced with a deterministic mock. Property tests added for path normalisation and glob-boundary parsing. End-to-end approval-path test covering the full prompt → confirm → execute flow. All 6 root docs rewritten from scratch — README, CLAUDE, PHILOSOPHY, PARITY, ROADMAP, USAGE — accurate and consistent with the actual system. Verification gates — both green: ✅ cargo test --workspace — 697 tests, 0 failures ✅ cargo clippy --all-targets --all-features -- -D warnings — 0 warnings Score across 5 engineering dimensions: 88/100 with a clear path to 99. Production verdict from the reviewer: "Production-capable for internal use and trusted operator workflows." The most important lesson from this session: a passing test suite is not enough. Real production readiness means every tool path is mediated, defaults are safe, and the codebase can be maintained by someone who wasn't there when it was written. Still working toward 99. The remaining gaps are known, documented, and on the roadmap. #Rust #SystemsProgramming #OpenSource #ProductionEngineering #CodeQuality #ClawCode

1 Comment

Parth Patel 3w

https://github.com/benchbrex-USA/Claw-Code-Beta

To view or add a comment, sign in

More Relevant Posts

Umar Farook M
3w
Report this post
𝗪𝗵𝘆 𝗠𝗼𝘀𝘁 𝗗𝗲𝗯𝘂𝗴𝗴𝗶𝗻𝗴 𝗘𝗳𝗳𝗼𝗿𝘁𝘀 𝗙𝗮𝗶𝗹 ⸻ We spent hours debugging… and we were looking in the wrong place. ⸻ 💡 What usually happens: • Engineers debug symptoms, not root cause • Logs can be misleading • Assumptions waste time ⸻ 🧠 Core insight: 👉 Debugging is about asking the right question, not searching more logs ⸻ A recent incident reminded me of this. We had a failure in a QR code generation flow for a tenant. At first, everything looked correct: • feature was enabled • configuration seemed valid • another implementation had worked earlier So naturally, we assumed: 👉 the issue must be in the new implementation ⸻ We went deeper: • checked API flow • reviewed logs • compared implementations Everything pointed in one direction… But it was the wrong one. ⸻ The actual issue? 👉 A hidden tenant-level configuration. A fallback feature was configured directly for that tenant. So even after adding the correct implementation: • the system still picked the fallback • the expected flow never executed • and the error persisted ⸻ The tricky part? This wasn’t obvious in the code. It was buried in configuration. ⸻ That’s when the question changed: ❌ “Where is the bug in the code?” ✅ “What path is the system actually taking?” ⸻ And that’s where we found it. ⸻ 𝗗𝗲𝗯𝘂𝗴𝗴𝗶𝗻𝗴 𝗶𝘀𝗻’𝘁 𝗮𝗯𝗼𝘂𝘁 𝗿𝗲𝗮𝗱𝗶𝗻𝗴 𝗺𝗼𝗿𝗲 𝗹𝗼𝗴𝘀. 𝗜𝘁’𝘀 𝗮𝗯𝗼𝘂𝘁 𝗮𝘀𝗸𝗶𝗻𝗴 𝗯𝗲𝘁𝘁𝗲𝗿 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀. ⸻ Curious: 👉 What’s the most misleading bug you’ve debugged? ⸻ #SoftwareEngineering #Debugging #BackendEngineering #Microservices #EngineeringLessons
1 Comment
Like Comment
To view or add a comment, sign in
Nimin Prabha Sasidharan
3w
Report this post
What Anthropic’s Claude Code leak reminded us about shipping software Anthropic said the Claude Code leak was caused by a release packaging mistake, not a breach. Reports said a source map was included in a published package which exposed a large portion of the internal source code. Anthropic also said no customer data or credentials were exposed. That is exactly why this incident matters. A release can succeed technically and still be the wrong release. Most teams review what goes into the repo. Many review the pull request carefully. Fewer give the same attention to the final package that gets built and published. That is where mistakes can slip through. A few practical reminders: • What is in the repo is not the same as what gets published. • A clean PR does not guarantee a clean release. • It is safer to define what is allowed to ship than to rely only on exclusion rules. • Release checks should catch debug artifacts before publish. Good engineering is not only about writing code that works. It is also about making it harder to ship the wrong thing. A useful reminder for all software engineers. #SoftwareEngineering #Security
Like Comment
To view or add a comment, sign in
D Prasad M
1mo
Report this post
I stopped trying to “fix bugs.” And started asking a different question: “Why did the system behave this way in the first place?” That shift changed everything. Because in real systems: • There’s rarely a single point of failure • Most issues aren’t obvious • And everything works… until it doesn’t What I’ve learned is this: The hardest production problems don’t come from broken code. They come from interactions between things that individually work fine. A retry here. A cache there. Parallel requests somewhere else. All good decisions. Until they meet under load. That’s when systems don’t crash — They slowly degrade, confuse you, and make you question everything. Fixing it isn’t about adding more code. It’s about removing complexity and understanding behavior at scale. That’s the part of engineering I enjoy the most. #SoftwareEngineering #SystemDesign #DistributedSystems #FullStack #CloudEngineering #EngineeringMindset

1 Comment
Like Comment
To view or add a comment, sign in
Segrove

19 followers
3w
Report this post
The riskiest parts of your codebase are the ones nobody questions anymore. Internal teams normalize what they live with. A workaround that was "temporary" two years ago becomes invisible. A deployment that needs manual steps every time stops registering as a risk. Architecture choices made under early constraints never get revisited. This is not a quality problem. It is a proximity problem. The more you know a system, the less you question it. Most internal reviews check code quality and maybe tests. A structured outside review goes further: security, license compliance, deployability, running costs, data-structure scalability, observability, performance. The gaps it finds are rarely in the code you are actively changing. They are in the parts you stopped examining. If your confidence in the system is based on how long it has been since something broke, that is assumption, not evidence. When was the last time someone outside your team reviewed the parts you stopped changing? #SoftwareEngineering #TechnicalDebt #ProductHardening
Like Comment
To view or add a comment, sign in
Eugene Zhang
6d
Report this post
Three builds. One mistake. Now I have the rule. I kept launching subagents for everything i build because I thought more agents meant better output. It doesn't. My bug-fixing pipeline had three agents. Reproduce, debug, fix. Each handed to the next. Every handoff lost context. The fix agent was guessing. One question fixes this: does the intermediate work matter to you? If you need to see the process, keep it in your main thread. If you just need the result, delegate it. Code reviews work as subagents. The reviewer sees the diff fresh, no memory of how the code was written. Research too. Main thread gets the answer. The 40-file search stays invisible. The architecture never changed. The question did.
Like Comment
To view or add a comment, sign in
Nicholas Fernandes
1w
Report this post
Estimated 3 days, delivered in 10 days. The task: push infrastructure changes across 12 active branches in rancher/charts (SUSE's Kubernetes chart repository). Doing it manually wasn't an option anymore, since this has been a recurrent task. So I built the tooling first: - Docker-based propagation engine that reads a YAML config - Spins up isolated containers per branch - Syncs the infrastructure - Opens PRs automatically. Halfway through, I found a legacy validation script that was half-broken. I could have skipped it. I didn't. That decision alone added days. I saw the potential to improve CI validation to avoid future bottlenecks during Minor updates. But the engine is now running in production. One command propagates infrastructure changes across all active branches — dev and release — with automated PRs and fail-fast error handling. Supply chain security hardening shipped across the entire repo in a single operation. How many hours of manual work will this save? I honestly don't know. But every future infra change won't require anyone to touch 12 or more branches by hand. Human error removed from the equation. Worth the extra week (final result): https://lnkd.in/dEJQ57JJ

[automation-core] propagate infra by nicholasSUSE · Pull Request #7327 · rancher/charts github.com
Like Comment
To view or add a comment, sign in
Abhishek Powar
1mo
Report this post
500k lines of internal code went public overnight 🚨 Not because of a breach. Not because of an attacker. Because of a routine release. Yesterday, the full source code of Claude Code left the company through a routine release. The bundle held 500k lines, every internal module, every toggle for unfinished features, everything was reachable on the public registry. A single missed configuration, The result was immediate and total exposure. When things like these happen it's easy to point the finger at the human error or neglect. But that misses the point. Human error isn’t an edge case it’s a constant risk. Humans forget checkboxes. Humans skip steps under pressure. Memory feels reliable until it is not. The error did not lie with the engineer, the error was in the lack of a process that keeps things safe even when an engineer slips. This is where guardrails come in. A system that catches mistakes when humans forget to check a box. A system that raises warnings when something looks off, and blocks when the risk crosses a threshold , flags unexpected file size spikes. A system that takes over the repetitive validation work, so safety doesn’t depend on someone remembering the checklist. After an incident, the correct question is not "Who forgot?" The correct question is "Which safeguard was missing between the desk and the production server?" Anthropic stated openly that a human action triggered the leak and that new guardrails will follow. That is the proper stance, a review of mechanisms, not a hunt for blame. Takeaway : If your release process depends on someone “remembering,” it’s already broken. Humans will never be flawless. Invest in guardrails that catches mistakes when humans slip. #SoftwareEngineering #DevOps #ReleaseEngineering #BlamelessPostmortem #ClaudeCode
2 Comments
Like Comment
To view or add a comment, sign in
Jonathan Behar
1mo
Report this post
We've all shipped agent code at 3 AM, thinking we nailed it. Then a rogue memory state, a single lost context, brings the whole goddamn system down. You spend a day tracing a phantom. Disparue. That's a week lost on a single ghost. The latest report on agent state management confirms what I’ve been screaming: your agents aren't just functions. They're entities with memory, and that memory needs a backbone stronger than a JSON blob in RAM. We're talking persistent, fault-tolerant state, designed from day one. Like a distributed ledger, but for an agent's brain. This isn't about fancy new LLMs. It’s about the plumbing. The infrastructure. The unglamorous, absolutely critical engineering that lets you scale from a demo to 21 systemd services across two continents. Granular checkpoints, explicit state transitions – these aren't features, they're foundations. They cut debugging from days to minutes. They let us ship 167,642 lines of code in 10 days because we aren't spending half that time chasing phantom bugs from volatile agent memory. When I architect multi-agent councils, I design the state layer before the prompt. This isn't an afterthought. It's the first thing on the whiteboard. Every agent’s 'brain' – its current task, its context, its learned parameters – needs its own robust, versioned storage. If you don't build it this way, you're building a house on sand, hoping the wind doesn't blow. You think you're fast? Try rebuilding a complex agent workflow from scratch because its memory just… vanished. That’s not fast, that’s financially ruinous. Do we build agents with throwaway memory, or architect for production resilience?
Like Comment
To view or add a comment, sign in
Muhammad Absar
5d
Report this post
That "temporary" fix you shipped last quarter is now a core dependency. It starts with an urgent bug. A quick patch is pushed to production with a comment: `// TODO: Refactor this`. The team agrees it's a temporary solution. The ticket for the proper fix is created, but it's immediately de-prioritized for new feature work. A few sprints later, another developer builds a new abstraction on top of your temporary code, unaware of its fragile foundation. The original context is lost. This is how technical debt metastasizes. The temporary fix wasn't just a static liability; it had a half-life. The longer it sat, the more it decayed, radiating complexity and risk into surrounding modules. What was once a simple surgical fix now requires a major refactoring project that touches multiple services. The most dangerous code isn't the obviously broken part. It's the temporary solution that works just well enough to be forgotten, but not well enough to be stable. Either schedule the real fix immediately or treat the "temporary" code as permanent and give it the tests and documentation it deserves. How does your team track and manage these "temporary" solutions before they become permanent problems? Let's connect — I share lessons from the engineering trenches regularly. #SoftwareEngineering #TechnicalDebt #SystemDesign
Like Comment
To view or add a comment, sign in
Mateus Eduardo Pereira
3w
Report this post
When a production issue happens, technical skill is not the first thing tested. Decision quality is. Last month, I faced a backend incident during a high-traffic period. The team had 2 options: • Ship a quick patch directly in a critical flow • Roll back, stabilize, and fix with safer validation The quick patch looked faster. But it also increased risk in a part of the system with many integrations. We chose to roll back first. What happened after that: • Incident impact was reduced quickly. • We had time to identify the real root cause. • The final fix was smaller, clearer, and safer to maintain. My main lesson: In pressure moments, good engineers don’t choose the fastest code change. They choose the option with the best risk/clarity trade-off. This is where architecture and communication work together. How do you usually decide under pressure: quick patch or rollback first? #SoftwareEngineering #BackendEngineering #SoftwareArchitecture #SystemDesign #EngineeringMindset #ScalableSystems #TechGrowth
5 Comments
Like Comment
To view or add a comment, sign in

6,142 followers

View Profile Connect

Claw Code Beta Security Patch Released

More from this author

The Crucial Role of Pre-Production Inspections in Manufacturing

From Struggles to Impact: My Journey Through Engineering, AI, and Social Change

The Irreplaceable Love in a Mother's Cooking: An Ode to Home-Made Meals in the Age of AI

Explore content categories