Over the past few months in my PhD, I’ve been exploring different aspects of compilers, especially tensor and sparse tensor compilers. I started by analyzing existing sparse tensor compilers and quickly realized something uncomfortable: there isn’t a complete solution. Most systems feel stitched together, lacking maturity and reliability. To dig deeper, I began building a fuzzer to identify bugs across these infrastructures. This is ongoing work, and I’m gradually expanding it to support more compilers and improve its robustness. While integrating MLIR’s sparse tensor dialect into the fuzzer, one thing stood out, it feels significantly more mature than other approaches, particularly in generating loop structures for sparse iteration. That said, optimization on top of these structures remains challenging. Recently, I’ve been working on a vectorization optimization tailored to MLIR’s sparse loop structures. The speedups are promising, but I’ll be honest, I don’t yet fully understand the hardware-level reasons behind them. That’s something I’m actively trying to close the gap on. In parallel, I’ve been learning polyhedral optimization (even though applying it to sparse data is not straightforward). One broader realization during this process: there’s no clear "getting-started" path in this space. I come from the YouTube era of learning programming language where learning meant: start small -> build something -> understand deeper. But in compiler research, most resources jump straight into first principles, which can make it hard to build momentum early on. To deal with this, I started organizing my notes (previously scattered across Obsidian and Notion) into structured blog posts, focused on building intuition through implementation rather than just theory. So far: - I’ve started an MLIR blog series (https://lnkd.in/eHsG-kMi) - I’m writing an ongoing series on polyhedral optimization and ISL (https://lnkd.in/ewGC2pDT) I plan to finish these series within the coming summer. Next up: a deep dive into SIMD/vectorization and why AVX-512 gather/scatter can bottleneck sparse workloads. I plan to write my entire learning path as different blog series. If you’re getting into compiler research without much prior exposure, I hope this helps you get moving faster. Honestly, I suspect more people will read these posts than my future thesis.
Sparse Tensor Compiler Research and Learning Path
More Relevant Posts
-
We are at a 'complier moment' today. Here's why I say that- Time for a small history lesson- I’ve been spending a lot of time looking at history(1958), because history is always the guide to the future.. Back then, if you wanted to talk to a computer, you engaged in all manual labor, for example, "hand-to-hand combat" with hardware. You managed memory addresses manually. You wrote in binary or hex. It was a "priesthood"- a very small group of people who knew how to work with such a computer. Who held the access keys to that technology. Then came Fortran, the first high-level language with a real compiler, but can you imagine the reaction? The reaction was a lot of resistance, a lot of pessimism, and the small cohort that knew how to use that technology- the priesthood, hated it. They argued on these fronts: Efficiency: "A machine can never optimize code as well as a human." Control: "If I don't write the assembly, I don't own the bugs." Trust: "How do I know the compiler actually understands my intent?" We might have grown out of that technology, and we might have come very far, but the conversations we are having about the upcoming AI models like the LLMs and AI-assisted coding today are very similar. By 1959, the most successful engineers weren't the ones who stayed in the "priesthood" of assembly- they were the ones who embraced the abstraction, adopted the technology to solve bigger problems, while still understanding the logic of the layer beneath. I believe that we are at a similar “compiler moment” today. If you refuse the tool, you become a relic. If you rely only on the tool, you are simply called a vibe coder with no fundamentals and foundation. The groundbreaking path for an elite in the engineering world would be to master the fundamentals while respecting the new tools. A lot of ability to write code and software engineering might be getting automated, but we are not losing the art of engineering. We are just moving the needle on what tomorrow's fundamentals might actually mean. The fourth and fifth generations have brought us to a point where "coding" is increasingly a task of "specifying intent." However, just as the Fortran programmer needed to understand the "whys" of their algorithms, the modern AI engineer needs to understand the "whys" of machine learning to prevent the generation of "brittle code" and "hallucinated logic". So, are you building on a foundation, or are you just prompting into the void?
To view or add a comment, sign in
-
-
Releasing Raster 0.1 — typed multiple dispatch for Clojure What if you could write math in Clojure and get compiled performance that matches Julia and JAX — without leaving the REPL? Raster brings Julia-style typed multiple dispatch to the JVM. You define functions with `deftm`, annotate parameter types, and the compiler does the rest: devirtualization, automatic differentiation, buffer fusion, SIMD vectorization — all the way down to JVM bytecode. No DSL, no external toolchain. Every optimization is inspectable via `explain-pipeline`. The results surprised us too: * ODE solving (Dormand–Prince 5): 1.4× faster than Julia's DiffEq * LeNet-5 training (compiled AD + SGD): 1.7× faster than JAX on CPU * Forward-mode AD sensitivity: matching Julia's ForwardDiff * Zero heap allocations in compiled hot paths The key idea: don’t build another framework — build a compiler that understands typed dispatch, automatically differentiates, and fuses parallel operations end-to-end. Write generic code with `par/map` and `par/reduce`, get specialized SIMD loops on CPU or OpenCL/Vulkan kernels on GPU from the same source (Futhark inspired). Raster also ships with scientific computing (ODE/PDE solvers, optimization, FFT, special functions), linear algebra (dense + sparse, LAPACK via Panama FFI), deep learning primitives (conv, attention, normalization — all with reverse-mode AD), symbolic computation, and resource-aware compiler optimizations. It's the numerical substrate we are working towards to build collaborative simulation tools at scale. Open source, Clojure-native, JVM-hosted. Try it at the REPL. https://lnkd.in/gBHbi3Dz This is a first release, feedback and contributions are very welcome. #Clojure #JVM #NumericalComputing #MachineLearning #HighPerformanceComputing #Compilers #SIMD #GPU #OpenSource #FunctionalProgramming #Julia #JAX
To view or add a comment, sign in
-
Christians work is always mind-expanding and worth a look! Raster is no exception, and it’s not just interesting for clojure users.
Releasing Raster 0.1 — typed multiple dispatch for Clojure What if you could write math in Clojure and get compiled performance that matches Julia and JAX — without leaving the REPL? Raster brings Julia-style typed multiple dispatch to the JVM. You define functions with `deftm`, annotate parameter types, and the compiler does the rest: devirtualization, automatic differentiation, buffer fusion, SIMD vectorization — all the way down to JVM bytecode. No DSL, no external toolchain. Every optimization is inspectable via `explain-pipeline`. The results surprised us too: * ODE solving (Dormand–Prince 5): 1.4× faster than Julia's DiffEq * LeNet-5 training (compiled AD + SGD): 1.7× faster than JAX on CPU * Forward-mode AD sensitivity: matching Julia's ForwardDiff * Zero heap allocations in compiled hot paths The key idea: don’t build another framework — build a compiler that understands typed dispatch, automatically differentiates, and fuses parallel operations end-to-end. Write generic code with `par/map` and `par/reduce`, get specialized SIMD loops on CPU or OpenCL/Vulkan kernels on GPU from the same source (Futhark inspired). Raster also ships with scientific computing (ODE/PDE solvers, optimization, FFT, special functions), linear algebra (dense + sparse, LAPACK via Panama FFI), deep learning primitives (conv, attention, normalization — all with reverse-mode AD), symbolic computation, and resource-aware compiler optimizations. It's the numerical substrate we are working towards to build collaborative simulation tools at scale. Open source, Clojure-native, JVM-hosted. Try it at the REPL. https://lnkd.in/gBHbi3Dz This is a first release, feedback and contributions are very welcome. #Clojure #JVM #NumericalComputing #MachineLearning #HighPerformanceComputing #Compilers #SIMD #GPU #OpenSource #FunctionalProgramming #Julia #JAX
To view or add a comment, sign in
-
Got it — I’ll create a complete 15-slide Gamma-ready presentation using a clear, student-friendly + professional style (mix of Outline 1 & 2) so it looks clean and easy to present. You can copy-paste this directly into Gamma AI 👇 --- 🎯 Presentation Title: Parse Tree & Ambiguity --- Slide 1: Title Slide Parse Tree & Ambiguity Understanding Syntax Trees in Compiler Design Presented by: Shakil Course: Compiler Design / Theory of Computation --- Slide 2: Introduction In programming languages, structure matters Syntax defines how code is written Parse trees help represent structure Important for compilers and interpreters --- Slide 3: What is a Grammar? Grammar defines rules of a language Uses production rules Example: E → E + E E → id Known as Context-Free Grammar (CFG) --- Slide 4: What is a Parse Tree? A tree representation of a string Shows how grammar generates a string Root = Start symbol Leaves = Final string (terminals) --- Slide 5: Components of Parse Tree Root Node → Start symbol Internal Nodes → Non-terminals Leaf Nodes → Terminals Edges → Production rules --- Slide 6: Derivation Concept Step-by-step rule application Two types: Leftmost derivation Rightmost derivation Both can produce same string --- Slide 7: Example Grammar Grammar: E → E + E E → E * E E → id Example string: id + id * id --- Slide 8: Constructing Parse Tree Start from root (E) Apply production rules Expand step-by-step Reach terminal string --- Slide 9: What is Ambiguity? A grammar is ambiguous if: 👉 One string has more than one parse tree Leads to multiple meanings Causes confusion in compilers --- Slide 10: Ambiguous Grammar Example Grammar: E → E + E E → E * E E → id String: id + id * id --- Slide 11: Two Parse Trees (Concept) First interpretation: 👉 (id + id) * id Second interpretation: 👉 id + (id * id) ⚠️ Same string, different meanings! --- Slide 12: Problems Caused by Ambiguity Confuses compiler Incorrect evaluation Hard to predict output Not suitable for programming languages --- Slide 13: Removing Ambiguity Use rules like: ✔ Operator precedence ✔ Associativity Example (fixed grammar): E → E + T | T T → T * F | F F → id --- Slide 14: Why Ambiguity Matters Important in compiler design Ensures correct program execution Used in parsing algorithms Helps avoid logical errors --- Slide 15: Conclusion Parse tree shows structure of strings Ambiguity creates multiple interpretations Must be removed for correct parsing Essential concept in computer science --- ✅ Bonus Tip for Gamma AI When you paste this into Gamma: Choose "Presentation" mode Select modern theme Add diagrams for: Parse tree Ambiguity example --- If you want next level 👇 I can also: ✅ Add **visual parse tree diagrams** ✅ Convert into **PowerPoint / PDF** ✅ Make it **super stylish (design + icons + animations)** Just tell me 👍
To view or add a comment, sign in
-
microgpt-c (https://lnkd.in/dW-xrjvc) is my attempt to port Andrej Karpathy's microgpt (https://lnkd.in/dUUDRknR) to C, with 0 external dependencies. Read about my attempt to hand-roll a scalar valued automatic differentiation engine at https://lnkd.in/deU2fCdf. The post lays out the architectural considerations and implementation details for a naive backprop engine. TDLR; Simply removing the python overhead results in a ~35x faster per-epoch training time, even with no attention paid to cache-locality and an extremely memory bound CPU execution. The improvement in the memory allocation strategy alone would result in a massive performance + throughput boost. Stay tuned for more!
To view or add a comment, sign in
-
Announcing ARCH — an open-source HDL co-designed by AI and for the AI era 🚀 After 5 weeks of part time development using Claude Code (and some Codex recently), I'm excited to open-source ARCH, a hardware description language and compiler purpose-built for micro-architecture design. The problem: SystemVerilog is a language loaded with 30+ years baggage. Implicit width conversions, operator precedence nuances, CDC/RDC issues, unintended latches, simulation/synthesis semantics differences, the list goes on and on. All these cause silent bugs. Don't get me started on using lint to detect these problems. We all personally know stories some designers waives a warning that hid the bug. And when an LLM tries to generate SV, it makes similar mistakes because it is trained on the flawed human code data. With velocity of AI agent code creation, these silent issues will be time bombs waiting to explode in the field. ARCH fixes this with: Strong types — bit widths, clock domains, and port directions checked at compile time. No silent truncation. First-class constructs — fsm, fifo, pipeline, arbiter, synchronizer are language keywords, not library patterns. 15 lines of ARCH replaces 60+ lines of hand-written SV. Auto-generated verification — overflow/underflow assertions on FIFOs, legal-state + transition coverage on FSMs, guard contracts on data-valid pairs. Provable with EBMC formal. Zero user effort. AI-native grammar — uniform schema, named block endings, no braces. An LLM that reads the spec can generate correct, type-safe hardware from a natural-language description without fine-tuning. The compiler emits clean, readable SystemVerilog compatible with Verilator, Icarus, EBMC and Yosys. It includes a built-in C++ simulator (arch sim) with debug instrumentation, VCD waveform output, and cocotb-compatible Python testbench support. Benchmarks: 156/156 VerilogEval problems solved. 273/275 CVDP designs pass. ARCH is ~25% shorter than equivalent SV. 📄 Paper: https://lnkd.in/gxPbtpy6 💻 GitHub: https://lnkd.in/gGnefhwM LGPL licensed. Contributions welcome. #OpenSource #HardwareDesign #HDL #FPGA #ASIC #AI #EDA #Verification #SystemVerilog
To view or add a comment, sign in
-
Paper: https://lnkd.in/gxPbtpy6 . GitHub: https://lnkd.in/gGnefhwM [ #GF : 11 Slides : 07/2025 : Enhancing #PDK_Library_Validation with #Machine_Learning : A Novel Approach to #Layout_Comparison : https://lnkd.in/gbXFypCR) : by : Farzana Akhter, Engineering Intern, Design Enablement PDK Nolan Pavek, Principal Eng Design Enablement PDK Romain Feuillette, Director, Design Enablement PDK] [ 22pgs : https://lnkd.in/gHUW_mhd) : https://lnkd.in/gv6ACEvm) : 7 Apr 2026 : Arch : A #NEW #AI_Native #Hardware_Description_Language Created using #Multi_Agentic_Vibe_Coding ( with Claude_Code) by : Shuqing Zhao ] [ #Hu_Mind_Ai : https://lnkd.in/gE_s_ue7) : 04/2026 "Add #AI_VLSI_Teammates to your design team / Charge up your development team with #AI_Workers Transform Your #Silicon_Development: Watch the VLSI Teammate Demo for #Rapid #RTL_Generation " ] [ https://lnkd.in/gzHzz7EW) : 04/04/2026 Multi_Agentic #Claude_Code #Development_Framework By Holger Kreissl : "A #Reusable .claude/ directory that #Turns Claude Code into a #Disciplined_Software_Engineering_Pipeline. Drop it into any project, any tech stack. One command — /ship — takes a feature from requirements to committed code." ] [ #Spade : https://spade-lang.org/ : 199pgs : https://lnkd.in/gHDKY3f2) Spade is a #Hardware_Description_Language inspired by modern software languages. Its strong type system, zero cost abstractions around common hardware constructs, and helpful compiler help you easily build complex hardware. Spade Book By Frans Skarman, with contributions from the community] [ ^#Synopsys : 9 Slides : https://lnkd.in/gs7DvmWB) : 11/03/2026 #Synopsys and #Agentic_AI_Initiatives : - Detailed Overview] [ ^#Siemens : 5 Slides : https://lnkd.in/g8wdHQQR) : 03/2026 #Fuse_EDA #AI_Agent ] [ #Cadence : 11 Slides + 23pgs https://lnkd.in/gj2eHmkc) : 02/10/2025 #Design for #AI and AI for Design by : Charles Alpert AI Fellow, Cadence ] [ 23pgs : 29/11/2025 The Dawn of #Agentic_EDA: A #Survey of Autonomous #Digital_Chip_Design by : Zelin Zang &... ] [ 112 Slides : https://lnkd.in/gT3bEqBz) : 19/01/2026 #China & #NVIDIA : Bi-Directional Synergy: A Tutorial on #Hardware_Design for #Agentic_AI and Agentic AI for Hardware Design by : Chaojian Li &..] [ 48 Slides : https://lnkd.in/gCQqqqtN) : September 9 2025 #ORFS_Agent: Tool-Using #Agents_for_Chip_Design Optimization by : Amur Ghose/ Andrew B. Kahng/Sayak Kundu/Zhiang Wang: 09/09/2025 UCSD] [ #ChipAgents : 20 Slides : https://lnkd.in/gVMme8su) : 2025 How #Agentic_AI is Reinventing #Chip_Design and #Verification by : William Wang ] [ #infinion : 16 Slides + 13pgs : https://lnkd.in/gFSZCsgk) : 25/02/2025 #Saarthi: The First #AI #Formal_Verification Engineer by : Aman Kumar &... ] [ #Infineon : 13pgs : https://lnkd.in/g2GAavFQ) : 01/03/2025 #Saarthi: The First #AI #Formal_Verification_Engineer by : Aman Kumar &..]
Announcing ARCH — an open-source HDL co-designed by AI and for the AI era 🚀 After 5 weeks of part time development using Claude Code (and some Codex recently), I'm excited to open-source ARCH, a hardware description language and compiler purpose-built for micro-architecture design. The problem: SystemVerilog is a language loaded with 30+ years baggage. Implicit width conversions, operator precedence nuances, CDC/RDC issues, unintended latches, simulation/synthesis semantics differences, the list goes on and on. All these cause silent bugs. Don't get me started on using lint to detect these problems. We all personally know stories some designers waives a warning that hid the bug. And when an LLM tries to generate SV, it makes similar mistakes because it is trained on the flawed human code data. With velocity of AI agent code creation, these silent issues will be time bombs waiting to explode in the field. ARCH fixes this with: Strong types — bit widths, clock domains, and port directions checked at compile time. No silent truncation. First-class constructs — fsm, fifo, pipeline, arbiter, synchronizer are language keywords, not library patterns. 15 lines of ARCH replaces 60+ lines of hand-written SV. Auto-generated verification — overflow/underflow assertions on FIFOs, legal-state + transition coverage on FSMs, guard contracts on data-valid pairs. Provable with EBMC formal. Zero user effort. AI-native grammar — uniform schema, named block endings, no braces. An LLM that reads the spec can generate correct, type-safe hardware from a natural-language description without fine-tuning. The compiler emits clean, readable SystemVerilog compatible with Verilator, Icarus, EBMC and Yosys. It includes a built-in C++ simulator (arch sim) with debug instrumentation, VCD waveform output, and cocotb-compatible Python testbench support. Benchmarks: 156/156 VerilogEval problems solved. 273/275 CVDP designs pass. ARCH is ~25% shorter than equivalent SV. 📄 Paper: https://lnkd.in/gxPbtpy6 💻 GitHub: https://lnkd.in/gGnefhwM LGPL licensed. Contributions welcome. #OpenSource #HardwareDesign #HDL #FPGA #ASIC #AI #EDA #Verification #SystemVerilog
To view or add a comment, sign in
-
Can AI agents build software that comes with a mathematical proof that it works? At Basis Research Institute, we set four agents to build a verified compiler. Compilers are large, complex pieces of engineering, deep in the software stack. Anthropic recently showed that agents can build one from scratch. We wanted to ask the next question: can they build one that is verified correct? A verified compiler is one that comes with a machine-checked proof that it is mathematically correct. Most software is checked by running it on examples and checking its behaviour matches expectations. A proof guarantees it works on every example, including the ones no one ran. We tasked a team of agents to build a verified JS-to-WASM compiler in Lean. Over 14 days they wrote 93,000 lines of code and produced a compiler that ran. But they did not prove it correct. The agents built an interpreter, a target semantics, and the compiler between them. But the proofs never closed. They repeated broken strategies across sessions, forgot what had failed, and wrote the same lemma 122 times rather than abstracting it once. More capable models will help, but we think the bigger lever is verified program synthesis infrastructure designed around agents' capabilities and flaws. Full writeup: https://lnkd.in/er6mg4D4
To view or add a comment, sign in
-
I wrote a DSL compiler this week for ZeroClaw . Not because I had to because the alternative was worse , my AI agent (ZeroClaw) needed to control hardware on a Raspberry Pi. The naive loop is agent calling tool or need to Rust written tools, you cross-compile, copy the binary, run it. Each iteration takes approximate 2 minutes for an agent that wants to experiment with hardware, that's death. So I built a hardware description language and a compiler for it ( just to understand DSL compilers and because I already architected the HAL interphase I knew what was goin on underneath). Example of what ZeroClaw/developer writes : servo CLAW_LEFT { pin: PA1 pwm_channel: 1 range: 0..180 frequency_hz: 50 } Four stages, hand-written, no parser-combinator crates lexer (finite automaton) → recursive-descent parser → semantic analyzer (collects all errors, doesn't fail fast) → a code generator that emits Rust for either STM32 or Raspberry Pi Then I embedded the compiler inside the agent. No subprocess. No file I/O. The agent generates HAL, the compiler validates it in-process, rppal toggles the pin. 2 minutes → under 1 millisecond. Same task. Six orders of magnitude. The lesson: when your tooling is the bottleneck, build even better tooling. Even when "better tooling" means writing a compiler at 2am.😂 What did i made? Answer : I built a Hardware DSL (Domain Specific Language) and a Compiler called zeroclaw-halc. The DSL: A simple way to write "code" that describes hardware (like LEDs, buttons, or motors) in plain text. The Compiler: A tool that reads that text, checks it for mistakes (like using the wrong pin), and turns it into actual hardware actions. Who can use it? Answer : Developers: They can use it to quickly generate the complex boilerplate code needed to start a new hardware project on Raspberry Pi or STM32. The ZeroClaw AI Agent: This is the big one. The AI now has a "manual" it can write. When it wants to control a light or a motor, it writes a .hal description, and my compiler tells it exactly how to do it. How does it help ZeroClaw? Answer: Before this, ZeroClaw was mostly "software." Now, it has a direct, safe link to the physical world: Safety: The compiler acts as a "safety guard." If the AI tries to use a pin that doesn't exist or conflicts with another part, the compiler catches it before anything breaks or burns out. Instant Control: ZeroClaw doesn't have to wait for a human to write code. It can generate its own hardware configuration on the fly and execute it in milliseconds on a Raspberry Pi. Multi-Platform: It makes ZeroClaw "hardware-agnostic." Whether you are using a Raspberry Pi or an industrial STM32 chip, the AI uses the same language to talk to both. thank you Sundai for the event..! #Rust #Compilers #EmbeddedSystems #EdgeAI #ZeroClaw 🦀
To view or add a comment, sign in
-
-
Inside torch.compile — The Real ML Compiler Stack Explained If you’re working with PyTorch and still think your model “just runs on GPU” you’re missing the most important layer. This deep dive from CompilerSutra breaks down what actually happens under the hood of torch.compile 👇 🔗 https://lnkd.in/gS5MxuAs 👉 Reality check: Your GPU doesn’t run PyTorch code directly. It runs compiler-generated kernels after multiple transformation stages. () And torch.compile is the system that bridges that gap by tracing Python, building graphs, and compiling them into optimized execution paths. () If you want to truly understand ML systems, performance, or even optimization this is the layer you can’t ignore. #MLCompilers #PyTorch #TorchCompile #SystemsProgramming #LLVM #GPU #DeepLearning #CompilerDesign #CompilerSutra #ai
To view or add a comment, sign in
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
I can also recommend MLIR For Beginners: https://github.com/j2kun/mlir-tutorial - a series of articles on the MLIR framework by Jeremy Kun & 20-min summary talk: https://www.youtube.com/watch?v=ne5D_kqlxYg along with How Slow is MLIR (2024), https://www.youtube.com/watch?v=7qvVMUSxqz4 & Deep Dive on MLIR Internals, Operation & Attribute, towards Properties (2023), https://www.youtube.com/watch?v=7ofnlCFzlqg.