Andreas Happe’s Post

LLMs have become disturbingly capable pen-testers. With 579 lines of python scaffolding code, an LLM can autonomously compromise an Active Directory network. Privilege escalation, lateral movement, domain dominance.. the whole thing, as tested against the GOAD (Game of Active Directory) testbed. We've just released a new version of Cochise (https://lnkd.in/dMJFCN-u), our open-source prototype for autonomous assumed-breach pentesting, with a focus on simplicity and readability. If you're researching LLM-based offensive security, this is meant as a baseline and starting point. The accompanying paper was accepted at ACM TOSEM, and I'll be presenting at ICSE in Rio de Janeiro next week. If you're there and want to grab a coffee or an after-conference drink, message me.

GitHub - andreashappe/cochise: Autonomous Assumed Breach Penetration-Testing Active Directory Networks github.com

6 Comments

Shoriful Islam 3w

Based on a lab people solved earlier? You can just do it with a skill.md

Richard B 3w

Andreas — interesting baseline, and the GOAD results are compelling. The 576-line scaffold is the point worth noting. The capability isn't in the code — it's in the model. The scaffold just removes the friction. That's exactly the threat model we've been building against. NIGHTFALL takes the opposite approach — 47 purpose-built offensive tools, zero LLM dependency, zero external API keys, Ed25519 cryptographic gate on destructive operations. The difference matters in production engagements where you need a reproducible, auditable, evidence-chained result — not a model making autonomous decisions you can't fully explain to a client. Cochise proves the concept. The question for practitioners is whether concept-proof is sufficient for real engagements. Congratulations on the TOSEM acceptance.

Ben Rollin 2w

genuinely would be interested to see a tool like this go head to head with an experienced AD pentester in an unknown, not purposefully vulnerable lab

4 Reactions

Thiago de Souza Filgueiras 3w

Very cool! Thank you for sharing

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

GitHub Security Lab

5,703 followers
2w Edited
Report this post
AI agents that execute commands, browse the web, and coordinate with other agents are everywhere. But how do you know they're safe? Season 4 of Github's Secure Code Game lets you find out by hacking one yourself. Free, hands-on, and you can get started in under 2 minutes! Learn more in our latest blog. https://lnkd.in/gacyENSm

Hack the AI agent: Build agentic AI security skills with the GitHub Secure Code Game https://github.blog

1 Comment
Like Comment
To view or add a comment, sign in
Armando Leotta
3w Edited
Report this post
What happens if/when the heavy sharing on GitHub (thousands of forks, stars, and mirrors by developers worldwide) turns this into a vector for abuse? Potential misuse and security risks? Yep, also including Vidar and GhostSock malware distribution. TL;DR Threat actors are also actively leveraging the recent Claude Code leak as a social engineering lure to distribute malicious payloads with GitHub serving as a delivery channel.

Anthropic Claude Code Leak | ThreatLabz zscaler.com

1 Comment
Like Comment
To view or add a comment, sign in
Vidhathri A.
2w
Report this post
The LiteLLM supply chain attack is a good reminder that your threat surface isn't just your code. It's everything your code depends on. One compromised package. 97 million monthly downloads. SSH keys, cloud credentials, API tokens, CI/CD secrets all potentially exposed. And the scary part? It was only caught because the malware had a bug that caused crashes. If the attacker had written cleaner code, it would still be running quietly in production pipelines right now. What makes this worse is the transitive dependency problem. You didn't even have to install LiteLLM directly. Something like dspy pulls it in automatically and now you're affected without even realizing it. What's even more interesting about this one is how the attack actually started. The threat actor didn't hack LiteLLM directly. They first compromised Trivy, the security scanner LiteLLM was using in its own CI/CD pipeline. That gave them the PyPI publishing token. One trusted tool used in a build process became the entry point for the whole thing. I think this is also a good moment to ask how many packages the average project actually needs. Some developers are starting to write simple utilities themselves instead of pulling in a dependency for every small thing. I get that it slows things down but maybe that tradeoff is worth revisiting. Full breakdown here: https://lnkd.in/eba43hdK #CyberSecurity #SupplyChainAttack #DevSecOps #Python #PyPI #CICDSecurity

How a Poisoned Security Scanner Became the Key to Backdooring LiteLLM | Snyk snyk.io
Like Comment
To view or add a comment, sign in
Todd Green
4w
Report this post
AI just autonomously developed two working root exploits against FreeBSD's kernel — from advisory to weaponized attack chain in ~4 hours. Not fuzzing. Not bug-finding. Full exploit development: multi-packet shellcode delivery, kernel thread hijacking, userspace shell. The same researcher has since identified 500+ high-severity vulnerabilities across other codebases using the same pipeline. The window between disclosure and exploitation just collapsed. If your patching cadence is still quarterly, your threat model is obsolete. https://lnkd.in/dme3fPWv

AI Just Hacked One Of The World's Most Secure Operating Systems social-www.forbes.com
Like Comment
To view or add a comment, sign in
Blake Larsen, CISSP, CEH
3w
Report this post
Seems that the biggest real-world danger is the follow-on abuse: security researchers say attackers are already using the leak as bait, with fake “leaked Claude Code” repos distributing Vidar and GhostSocks malware, and the same period also overlapped with a separate malicious Axios npm compromise. In plain English: the code leak itself is bad, but the more immediate operational risk is developers downloading poisoned forks, binaries, or packages because of the buzz around the incident.

Claude Code Source Leaked via npm Packaging Error, Anthropic Confirms thehackernews.com

1 Comment
Like Comment
To view or add a comment, sign in
UNDERCODE TESTING

1,269 followers
2w
Report this post
“ Code Leak Sparks GitHub Malware Frenzy: How a 598 MB Source Map Became a Cybercriminal Goldmine” + Video Introduction: A routine npm package update by AI company Anthropic in late March 2026 accidentally included a 59.8 MB JavaScript source map file containing internal Code source material. Within 24 hours, threat actors weaponized this leak, flooding GitHub with fake repositories that distributed credential-stealing malware disguised as the leaked AI software. This incident demonstrates how a single organizational packaging error can cascade into a large-scale social engineering campaign, exploiting developer trust in open-source ecosystems....

“ Code Leak Sparks GitHub Malware Frenzy: How a 598 MB Source Map Became a Cybercriminal Goldmine” + Video undercodetesting.com
Like Comment
To view or add a comment, sign in
Ken Anderson
3w
Report this post
Coding assistants are amazing bits of tech, but it opens you up to a number of new malware vectors that you should be aware of. A good friend Gary Longsine pointed me towards a new github project called sloppy-joe that is a one-stop shop for warding off these attacks. Whether you actually build and use sloppy-joe or not, check out the read me. It's a cornucopia of useful information regarding the potential vectors when using LLMs for coding. An awesome read! https://lnkd.in/eMmkvRcY

GitHub - brennhill/sloppy-joe: Shields against supply-chain, slopsquatting, and typosquatting attacks from dependencies and code. github.com

4 Comments
Like Comment
To view or add a comment, sign in
CyberCureME - Cyber Security Marketplace

9,105 followers
1w
Report this post
SGLang CVE-2026-5760 (CVSS 9.8) Enables RCE via Malicious GGUF Model Files: A critical security vulnerability has been disclosed in SGLang that, if successfully exploited, could result in remote code execution on susceptible systems. The vulnerability, tracked as CVE-2026-5760, carries a CVSS score of 9.8 out of 10.0. It has been described as a case of command injection leading to the execution of arbitrary code. SGLang is a high-performance, open-source serving

SGLang CVE-2026-5760 (CVSS 9.8) Enables RCE via Malicious GGUF Model Files thehackernews.com

1 Comment
Like Comment
To view or add a comment, sign in
Arthur deAlba
1w
Report this post
A critical security vulnerability has been disclosed in SGLang that, if successfully exploited, could result in remote code execution on susceptible systems. The vulnerability, tracked as CVE-2026-5760, carries a CVSS score of 9.8 out of 10.0. It has been described as a case of command injection leading to the execution of arbitrary code. SGLang is a high-performance, open-source serving

$SGLang CVE-2026-5760 $CVSS 9.8$ Enables RCE via Malicious GGUF Model Files$

SGLang CVE-2026-5760 $CVSS 9.8$ Enables RCE via Malicious GGUF Model Files thehackernews.com
Like Comment
To view or add a comment, sign in
Ortelius Open Source

923 followers
4w
Report this post
OpenSSF welcomes OSS-CRS, a new framework for LLM-based autonomous bug-finding and fixing. Discover how AIxCC innovations are advancing open source security.

Welcoming OSS-CRS to OpenSSF: The Future of AI-Driven Security openssf.org
Like Comment
To view or add a comment, sign in

2,611 followers

27 Posts

View Profile Follow

Andreas Happe’s Post

More Relevant Posts

Explore content categories