Discover Tree-sitter: Fast, Reliable Code Parsing for Developers

1mo

Most developers have never heard of Tree-sitter. That’s a problem. If you’re building anything that reads, analyses, or transforms code — you need to know about this. Tree-sitter is an incremental, error-tolerant parser library originally built by GitHub. It turns raw source code into a structured syntax tree, and it does it fast, even as you type. Here’s why that matters: ❌ The old way Regex patterns. String matching. Fragile heuristics that break the moment someone writes code in an unexpected style. ✅ The Tree-sitter way A proper Abstract Syntax Tree (AST) — language-aware, semantically structured, queryable. You’re not scanning text. You’re traversing meaning. A few things that make it remarkable: → Incremental parsing — it re-parses only what changed, making it real-time editor-friendly → Error tolerance — it keeps building a useful tree even when the code is broken (critical for live editing) → Language-agnostic — grammars exist for hundreds of languages: Python, Rust, Go, TypeScript, Java… → Structured queries — write pattern queries against the AST to find constructs like function definitions, API calls, import statements It’s what powers syntax highlighting and code navigation in Neovim, Zed, and GitHub’s code intelligence. But more interestingly for me — it’s the engine underneath tools that need to understand code at scale. When you’re building a migration accelerator, a refactoring tool, or any platform that needs to detect language patterns across a large codebase, you don’t want to guess. You want to parse. Tree-sitter makes that precise, fast, and genuinely reliable. If you’re building in the developer tooling or AI-native code intelligence space, this is one to have in your stack. #TreeSitter #DeveloperTooling #CodeIntelligence

1 Comment

Allan Smeyatsky 1mo

https://github.com/tree-sitter/tree-sitter

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Kreuzberg

227 followers
2w Edited
Report this post
Our tree-sitter-language-pack (v1.6) now supports 305 languages. tree-sitter is the parsing library behind syntax highlighting and code intelligence in editors like Neovim, Helix, and Zed. It's fast and error-tolerant. However, using it usually means tracking down, compiling, and managing a separate grammar repository for each language you need. tree-sitter-language-pack addresses this issue. It provides one package with over 305 parsers available across 12 ecosystems: Rust, Python, Node.js, Go, Java, Ruby, Elixir, PHP, C#, WASM, CLI, and C FFI. Parsers are fetched on demand and cached locally. The unified process() API returns structured code intelligence: functions, classes, imports, comments, and AST-aware chunks built for code intelligence tools and RAG pipelines. This is important for AI agents working with code. Kreuzberg already supports 97 file formats. With tree-sitter-language-pack, agents can now process source code in over 305 languages and get the same structured, semantic output. There is no need for parser management or custom tools for each language. tree-sitter-language pack is MIT licensed and open source, and it is part of the Kreuzberg org. GitHub tree-sitter-language pack: https://lnkd.in/dxsUGqC3 #OpenSource #AIInfrastructure #CodeIntelligence #TreeSitter #Kreuzberg
1 Comment
Like Comment
To view or add a comment, sign in
Shashwat Goyal
3w
Report this post
𝗦𝘁𝗼𝗽 𝗯𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗲𝘃𝗲𝗿𝘆 𝗯𝗮𝗰𝗸𝗲𝗻𝗱 𝘁𝗵𝗲 𝘀𝗮𝗺𝗲 𝘄𝗮𝘆. FastAPI isn't just "another Python framework." It's a deliberate choice — and knowing when to reach for it matters more than knowing how to use it. 𝗣𝗶𝗰𝗸 𝗙𝗮𝘀𝘁𝗔𝗣𝗜 𝘄𝗵𝗲𝗻: • You're building ML/AI-powered APIs and your team already lives in Python • You need async performance without the boilerplate of Go or Java • Auto-generated docs (Swagger/OpenAPI) aren't a nice-to-have — they're a requirement • You want type safety that actually catches bugs before production 𝗦𝘁𝗶𝗰𝗸 𝘄𝗶𝘁𝗵 𝘁𝗿𝗮𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗯𝗮𝗰𝗸𝗲𝗻𝗱𝘀 (𝗦𝗽𝗿𝗶𝗻𝗴, 𝗗𝗷𝗮𝗻𝗴𝗼, 𝗘𝘅𝗽𝗿𝗲𝘀𝘀, .𝗡𝗘𝗧) 𝘄𝗵𝗲𝗻: • Your org already has deep expertise and infra around them • You need battle-tested ORM support and a massive plugin ecosystem • You're building monoliths where convention-over-configuration saves months 𝗧𝗵𝗲 𝗿𝗲𝗮𝗹 𝗮𝗻𝘀𝘄𝗲𝗿? 𝗜𝘁'𝘀 𝗻𝗲𝘃𝗲𝗿 𝗮𝗯𝗼𝘂𝘁 𝘁𝗵𝗲 𝗳𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸. 𝗜𝘁'𝘀 𝗮𝗯𝗼𝘂𝘁 𝘁𝗵𝗲 𝗽𝗿𝗼𝗯𝗹𝗲𝗺. FastAPI shines where speed-to-deploy, async I/O, and Python-native ML pipelines intersect. Forcing it into a legacy enterprise CRUD app is like using a scalpel to chop wood. Choose your tools like an engineer, not a fan. Thoughts? When did FastAPI click (or not click) for you? #FastAPI #Python #BackendDevelopment #SoftwareEngineering #WebDevelopment #APIDevelopment #TechCommunity #Programming #MLOps #SystemDesign
Like Comment
To view or add a comment, sign in
Nikita Shevchenko
1mo
Report this post
I came across a benchmark of Claude Code across 𝟭𝟯 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲𝘀 and 𝟲𝟬𝟬 𝗿𝘂𝗻𝘀 on the same task: building a simplified Git-like system. The result? Dynamic languages won. Ruby, Python, and JavaScript were the fastest, cheapest, and most stable options for AI-assisted prototyping. A few takeaways stood out: • Adding type checkers created a real tax • Python + mypy was about 1.6–1.7x slower than plain Python • Ruby + Steep was about 2.0–3.2x slower than plain Ruby 𝗔𝗻𝗼𝘁𝗵𝗲𝗿 𝘀𝘂𝗿𝗽𝗿𝗶𝘀𝗶𝗻𝗴 𝗽𝗼𝗶𝗻𝘁: More compact code did NOT mean faster generation. Languages like Haskell and OCaml produced fewer lines of code, but required more reasoning from the model. And one more uncomfortable insight: Static typing did NOT automatically improve reliability. Across 600 runs, the only failures happened in statically typed languages. For me, the most interesting implication is what this means for 𝗚𝗼 𝗲𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝘀. 𝗚𝗼 may actually be in a strong position in the AI era: • simpler than Rust or advanced TypeScript type systems • more production-friendly than dynamic languages • but still likely slower for AI prototyping than Python or Ruby So the old workflow may become even more relevant: 1. Prototype fast in a dynamic language 2. Lock in the API and product logic 3. Migrate to Go for production reliability This is not a knock on Go. It is a reminder that we may need to optimize for two different layers now: • runtime and team efficiency in production • AI generation efficiency during prototyping “Best language” is no longer a single-axis question. It may soon include a new metric: dollars per feature shipped with AI Curious how others see this: Would you keep one language end-to-end, or prototype in Python/Ruby and move to Go later? #AI #SoftwareEngineering #ProgrammingLanguages #ClaudeCode #Golang #DeveloperTools
3 Comments
Like Comment
To view or add a comment, sign in
Vamsi Penmetsa
1mo Edited
Report this post
The Great 24-Hour Extraction: How Anthropic’s "Source Map" Slip Changed the Game ( yes its April fools day but this story is real🙂) Yesterday, a simple human error in a build script did what competitors have been trying to do for a year: It revealed the "Special Sauce" of Claude Code. The Leak: A 59.8 MB Javascript source map was bundled into version 2.1.88 of @anthropic-ai/claude-code. This wasn't just minified code; it was a roadmap to ~1,900 TypeScript files covering everything from "Self-Healing Memory" to "Agent Swarm" logic. The "Clean Room" Counter-Move: What’s most impressive isn't the leak itself, but the speed of the reimplementation. • Sigrid Jin (@realsigridjin) used OpenAI’s Codex (via the oh-my-codex orchestrator) to perform a systematic rewrite. • By porting the logic to Python, they’ve created a "legal buffer"—a clean-room implementation that replicates the behavior and architecture (the "claw-code" repo) without infringing on the specific TypeScript copyright. • As of this morning, the project has already surpassed 50,000 stars on GitHub, making it the fastest-growing repo in history. The Engineering Takeaway: We are officially in the era of Instant Legacy. If your competitive advantage is "hidden code," you don't have a competitive advantage. The only thing that stays proprietary in 2026 is your compute and your live data. The logic? That belongs to the agents now. Anthropic tried to sell a "Security Review" tool, but their own packaging script was the ultimate security failure. The community didn't just look at the code—they ingested it. Is this the end of "Closed Source" developer tools, or just a really expensive lesson in .npmignore? The "Claw-Code" Python Port: Repository and Discussion 👉 GitHub - instructkr/claw-code: The fastest repo in history to surpass 50K stars ⭐, reaching the milestone in just 2 hours after publication. https://lnkd.in/dSV3VjCC #claudecode
Like Comment
To view or add a comment, sign in
Patryk Golabek
1mo
Report this post
🚨 DRAMA ALERT 🚨 Anthropic just accidentally open-sourced Claude Code. Twice. Yes, you read that right. The company that called their CLI "secret sauce" and sent hundreds of DMCA takedowns to protect it... published source maps in their npm package. Again. Here's what happened and why it matters for every developer using AI coding tools. 𝗪𝗵𝗮𝘁 𝗮𝗿𝗲 𝘀𝗼𝘂𝗿𝗰𝗲 𝗺𝗮𝗽𝘀? When you ship JavaScript, you typically minify and obfuscate it. Source maps let you reverse that process for debugging. They contain the FULL original source code. Anthropic included these in their npm release. Oops. 𝗧𝗵𝗲 𝗶𝗿𝗼𝗻𝘆 𝗶𝘀 𝗱𝗲𝗹𝗶𝗰𝗶𝗼𝘂𝘀 → Claude Code ranks 39th on Terminal Bench. Dead last among harnesses using Opus. (I love it and it's still my favourite) → The "secret sauce" actually references Open Code's source to match their scrolling behaviour → Competitors copying Anthropic? Nope. Anthropic copying open source projects. 𝗛𝗶𝗱𝗱𝗲𝗻 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀 𝗱𝗶𝘀𝗰𝗼𝘃𝗲𝗿𝗲𝗱 • Dream Mode: Background agents that review past sessions and consolidate memories while you sleep • Coordinator Mode: Spawn multiple parallel workers with shared prompt cache • Kairos: Always-on proactive Claude that monitors your work and can push PRs automatically • Buddy: A Tamagotchi companion that hatches in your terminal (was scheduled for April 1-7) • Undercover Mode: Flag for Anthropic employees to hide that they're using Claude Code in public repos That last one is... interesting. Why hide it? 𝗪𝗵𝗮𝘁 𝘀𝗵𝗼𝘂𝗹𝗱 𝗔𝗻𝘁𝗵𝗿𝗼𝗽𝗶𝗰 𝗱𝗼? The cat's out of the bag. DMCAs won't put it back. 1. Announce a roadmap actually to open source it 2. Let engineers talk about these features publicly 3. Stop the legal theatre 4. Be humans, not lawyers The company that markets itself as the "human" AI lab needs to act like it. Repo fork (while it lasts): https://lnkd.in/gYX8hx9v What's your take? Should all AI coding tools be open source? #ClaudeCode #AIEngineering #OpenSource #SoftwareEngineering #DevTools

GitHub - instructkr/claw-code: The fastest repo in history to surpass 50K stars ⭐, reaching the milestone in just 2 hours after publication. Better Harness Tools that make real things done. Now writing in Rust using oh-my-codex. github.com
Like Comment
To view or add a comment, sign in
Prabhat Kashyap
4w
Report this post
Your AI code reviews are burning tokens they don't need to. Every time Claude reviews your code, it re-reads the entire codebase. 200 files. 150,000 tokens. For a change that touched 8 files. That's not smart. That's expensive. code-review-graph fixes this. It's an open-source tool that builds a persistent, incremental knowledge graph of your codebase using Tree-sitter and SQLite. Instead of dumping your entire repo into Claude's context, it sends only the changed files plus every file impacted by those changes. The result? 5 to 10x fewer tokens per code review. Before: 200 files scanned, ~150k tokens used. After: 8 changed + 12 impacted files, ~25k tokens used. Here's what makes it practical for real engineering teams: Works natively with Claude Code via MCP (Model Context Protocol). No extra setup, no new workflow to learn. Increments intelligently. After the first build (~10s for 500 files), subsequent updates take under 2 seconds. Only re-parses what changed. Understands blast radius. It traces dependency chains so Claude knows not just what changed, but what else that change could break. Supports 12+ languages out of the box: Python, TypeScript, JavaScript, Go, Rust, Java, C#, Kotlin, Swift, Ruby, PHP, and C/C++. Needs no external database. SQLite is all it takes. The architecture is clean: Tree-sitter parses your code into an AST, a SQLite + NetworkX graph stores the relationships, git diff drives incremental updates, and 8 MCP tools expose everything to Claude Code. Three review workflows ship with it: /code-review-graph:build-graph /code-review-graph:review-delta /code-review-graph:review-pr Whether you're a junior engineer just getting into AI-assisted development or a senior architect thinking about LLM cost optimization at scale, this tool addresses a real problem: context window efficiency. AI code review should be precise. Not brute-force. Check it out: https://lnkd.in/giHvG8pR #AIEngineering #ClaudeCode #LLM #TokenOptimization #CodeReview #OpenSource #DeveloperTools #SoftwareEngineering #MCP #GenAI

GitHub - tirth8205/code-review-graph: Local knowledge graph for Claude Code. Builds a persistent map of your codebase so Claude reads only what matters — 6.8× fewer tokens on reviews and up to 49× on daily coding tasks. github.com
Like Comment
To view or add a comment, sign in
Kohsuke Kawaguchi
2w
Report this post
My project of the day: NaturalScript - write & maintain executable scripts in natural language I find it off-putting when people generate scripts with LLMs and commit the results to Git repositories. Those scripts are verbose and hard to maintain. It's like committing a binary file without the actual source code. NaturalScript lets you write and maintain executable scripts in natural language. It lets you treat shell/Python/... scripts as generated artifacts that you don't edit directly. Let me know what you think. https://lnkd.in/erEt6ZxH

GitHub - kohsuke/NaturalScript: Write & maintain executable scripts in natural language github.com

8 Comments
Like Comment
To view or add a comment, sign in
Matjaž Domen Pečan
1mo
Report this post
I had Claude Code build a bash parser from scratch — and the process taught me more about AI-assisted development than the result itself. The context: I'm building tokf, a tool that compresses CLI output before it reaches LLMs like Claude Code. It needs to parse bash commands accurately — this is security-sensitive, since malformed parsing could enable prompt injection. My first approach used tree-sitter-bash, but community feedback pointed out accuracy gaps. The alternative, Parable (https://lnkd.in/dwPhzzry), is a Python library — not ideal as a dependency for a Rust binary. But it was a great inspiration. So I tried something different: gave Claude Code Parable's test suite and let it build a parser from scratch, as hands-free as possible. Within 17 hours it hit 100% compatibility with the test suite. But the architecture was wrong — parsing logic had leaked into the wrong layers. The real work was directing the refactor: stepping back, integrating into the real project, feeding that experience back, and guiding Claude toward a cleaner design. The final result, rable, passes 99.9% of 1,783 test cases (tree-sitter-bash manages 93.8% on an easier standard). It's now the sole parser in tokf. The takeaway: AI can write a lot of code fast, but the human role shifts to architecture, integration, and knowing when the design is wrong — even when the tests pass. https://lnkd.in/d_mG4gQS

Claude Building a Bash Parser pecan.si
Like Comment
To view or add a comment, sign in
Christo C
1mo
Report this post
AI can write code — but it can’t design your system Recently, I worked on a small project — a housing price prediction platform — to explore how different parts of a system come together. The setup was simple: 💠 A Python service running an ML model 💠 A Java backend handling APIs and market data 💠 A Next.js frontend combining everything into one portal While building this, I also experimented with using AI tools to generate parts of the code. But one thing became very clear: 👉 AI helps you build faster 👉 But it doesn’t replace thinking The real work was in: 💠 Deciding how services should communicate 💠 Separating responsibilities between layers 💠 Designing clean data flow across frontend and APIs For example, instead of calling the ML service directly from the frontend, I routed everything through the backend layer. That small decision made the system cleaner and easier to manage. So even though AI helped with implementation, the important part was still: understanding the system and making the right design decisions. I wrote a detailed breakdown of this project and what I learned here: 🔗 👉 https://lnkd.in/giAsVGt2 GitHub repo: https://lnkd.in/gMYbdCba Curious how others are using AI in development — especially when balancing code generation vs system design. #SoftwareArchitecture #AI #WebDevelopment #FullStack #DeveloperExperience

GitHub - christocdas0/housing-ml-fullstack-portal: Full-stack housing price prediction platform built with Python FastAPI, Next.js and Docker. Includes a machine learning regression model service and a multi-application portal for property estimation and market analytics. github.com
Like Comment
To view or add a comment, sign in

11,842 followers

View Profile Follow

Discover Tree-sitter: Fast, Reliable Code Parsing for Developers

More from this author

The harness is the product.

The Sovereign Intelligence Frontier: Integrating Vertex AI and Looker on Google Distributed Cloud for Middle Eastern Governance

Hyperscaler Sovereign Cloud Solutions Google Cloud · AWS · Microsoft Azure

Explore content categories

Discover Tree-sitter: Fast, Reliable Code Parsing for Developers

More Relevant Posts

More from this author

The harness is the product.

The Sovereign Intelligence Frontier: Integrating Vertex AI and Looker on Google Distributed Cloud for Middle Eastern Governance

Hyperscaler Sovereign Cloud Solutions Google Cloud · AWS · Microsoft Azure

Explore related topics

Explore content categories