Karpathy's microGPT: Pure Python GPT Training and Inference

2mo

The entire algorithm behind GPT fits in a single Python file. No PyTorch. No TensorFlow. No dependencies at all. Andrej Karpathy just dropped microgpt.py — a single-file, dependency-free implementation of GPT training and inference in pure Python. His tagline says it all: "The most atomic way to train and inference a GPT in pure, dependency-free Python. This file is the complete algorithm. Everything else is just efficiency." Let that sink in. Everything else is just efficiency. It's a hand-rolled autograd engine, attention mechanism, tokenizer, training loop, and text generator — all built with nothing but the Python standard library. And now there's an interactive educational visualization (by Tan Pue Kai) that lets you step through the computation graph and watch gradients flow in real time. Links to both in the comments. Here's what this actually teaches us. There's a narrative that AI is making software engineering obsolete. That anyone can "vibe code" their way to production. Karpathy's script is the perfect counterargument. It strips away every framework and abstraction, revealing what's actually happening. Matrix multiplications. Gradient chains. The math. When you remove the tools, what remains is understanding. The people who will thrive in the AI era aren't the ones who memorized API calls. They're the ones who know why the API works the way it does. Who can debug a training run by reasoning about the loss landscape, not Googling the error message. Those skills don't get automated away. They become more valuable as the tools get more powerful, because someone still needs to know when the tools are doing it wrong. The frameworks will change. The model architectures will change. The ability to decompose a system, understand it from the ground up, and build something better — that's permanent. Go read the code. Step through the visualization. The engineers who understand the machine will always be the ones steering it. #SoftwareEngineering #MachineLearning #AI #DeepLearning #FirstPrinciples #LLM #BuildInPublic

2 Comments

Tom Mathews 2mo

https://gist.github.com/karpathy/8627fe009c40f57531cb18360106ce95

Tom Mathews 2mo

https://github.com/tanpuekai/microGPT_webEdu

See more comments

To view or add a comment, sign in

More Relevant Posts

Dr. Saad Laouadi
1mo
Report this post
PyTorch Serialization Methods: ******************************* PyTorch gives you four ways to serialize a model. They are not interchangeable. Each one makes a different assumption about the environment you are deploying into — and that assumption is invisible until it does not hold. 🔹 `state_dict` saves the weights only. The architecture class must exist in Python at load time. One update worth making in PyTorch 2.x: pass `weights_only=True` to `torch.load`— the default changed in PyTorch 2.x. 🔹 `TorchScript` serializes the architecture and the weights together. No Python class needed at inference. If your model has conditional branches, use `torch.jit.script` rather than `torch.jit.trace` — tracing bakes in one execution path silently. 🔹 ONNX steps outside the PyTorch ecosystem entirely. It targets Triton, TensorRT, and ONNX Runtime. Pin the opset version explicitly — a mismatch between the exporter and the opset is the most common ONNX failure in production. 🔹 `torch.export` is the PyTorch 2.x path. Full computation graph, no Python at runtime, serialised to .pt2. It requires a complete graph with no breaks — more restrictive than `torch.compile`, but portable and fully serialisable. The choice comes down to one question: what will you not control after training? ✔️ Research environment, stable codebase → state_dict ✔️ Remove the Python dependency → TorchScript ✔️ Serving outside PyTorch → ONNX ✔️ PyTorch-native AOT serving → torch.export Which format does your current inference pipeline use? #PyTorch #MLEngineering #MachineLearning #Python #AIEngineering
2 Comments
Like Comment
To view or add a comment, sign in
Grizzly Peak Software

23 followers
1mo
Report this post
High traffic article: I wrote a ground-up guide to training an LLM using Python and PyTorch. No hand-waving, no "and then magic happens" explanations. We go through: → Data preparation and tokenization → Building the transformer architecture from scratch → Training loops, loss functions, and optimization → What actually happens inside the model during training If you've been using LLMs without really understanding how they work, this is the article to change that. Full guide: https://lnkd.in/g38fqAnu

The Fundamentals of Training an LLM - A Python & PyTorch Guide - Grizzly Peak Software grizzlypeaksoftware.com
Like Comment
To view or add a comment, sign in
Haurus

4 followers
2mo
Report this post
https://lnkd.in/e83Pa4M8 Generative Pre-trained Transformers ( GPT ) made simple in Python. 1. Generate, ingest content -> corpora 2. Prediction, corpora -> next-tokens 3. Transformers, next tokens -> matrix math Self-supervised next token predictions. Tokens predict vector direction.

Andrej Karpathy Built a GPT in 200 Lines of Pure Python. No Libraries. No Shortcuts. Just Math ninza7.medium.com
Like Comment
To view or add a comment, sign in
Ekta Shah
2mo
Report this post
LLMs aren't magic. They're just 200 lines of Python. Andrej Karpathy just dropped microGPT, the "final boss" of his educational projects. It’s a complete GPT-2 implementation in 200 lines with zero dependencies. But how do you actually use it? Karpathy demonstrates its power by training it on a dataset of 32,000 names. By the end of the script, the model "hallucinates" brand-new, plausible names (like vialan or konna) by learning the statistical patterns of the alphabet. You can use microGPT to: 1. Learn the "Gears": Step through the Value class to see exactly how backpropagation calculates gradients. 2. Train on your own "Mini-Documents": Swap out the names for any small text dataset (like Pokémon names or CSS colors) and watch it learn to generate them. 3. Experiment with the KV Cache: See exactly how "token communication" works in the attention block without the bloat of PyTorch. As Karpathy says: "Everything else is just efficiency." Read the full guide: https://lnkd.in/dPaguY39 "What's the most complex thing you've ever seen simplified to its bare essentials?" Answer in comments. #AI #MachineLearning #Python #AndrejKarpathy #Education #LLM
2 Comments
Like Comment
To view or add a comment, sign in
Jeffrey Vierra
2mo
Report this post
Your AI agent can't watch YouTube. So I fixed that. Two open-source Python tools that let AI agents read and watch YouTube videos. youtube-transcript.py — Pull any transcript instantly. No API key. One pip install. youtube-watch.py — Download the video, extract frames at intervals, grab the transcript. Your agent gets sequential image + timestamped text — enough to actually see what happened. MIT licensed. Zero cost: https://lnkd.in/gEpiWw3s #AI #OpenSource #AIAgents #DeveloperTools #Python #ClaudeCode #BuildInPublic

GitHub - jevierra/youtube-ai-reader: Let AI agents read and watch YouTube videos — transcript extraction and frame capture for LLM consumption github.com
Like Comment
To view or add a comment, sign in
Ahmed M Abdallah , PMP®
2mo
Report this post
🚀 Python Brain Teaser – Can you predict the output? During our AI & Data Analytics Sprint, we had this interesting question: Python Copy code d = {True: 'yes', 1: 'no', 1.0: 'maybe'} print(d) 🤔 What do you think the output will be? Most developers expect: Copy code {True: 'yes', 1: 'no', 1.0: 'maybe'} But the real output is: Copy code {True: 'maybe'} 🔥 Why? Because in Python: True == 1 → ✅ 1 == 1.0 → ✅ All three have the same hash value Dictionary keys must be unique So Python treats True, 1, and 1.0 as the same key. And since dictionaries keep the last assigned value, the final result becomes: Python Copy code {True: 'maybe'} 💡 Lesson learned: Understanding how Python handles equality and hashing is crucial — especially when working with data structures in AI, analytics, and backend systems. Subtle behaviors like this can cause silent logical bugs in real-world projects. 👇 Did you get it right before reading the explanation? #Python #AI #DataAnalytics #Coding #LearningEveryday #TechMindset
Like Comment
To view or add a comment, sign in
Minahil Riaz
2mo
Report this post
🚀 Built an Interactive AI Pathfinder in Python As part of our Artificial Intelligence coursework, my friend Nimrah Shahid and I developed a GUI-based pathfinding application that visualizes how different uninformed search algorithms explore a grid environment. The project implements: • Breadth-First Search (BFS) • Depth-First Search (DFS) • Uniform Cost Search (UCS) • Depth-Limited Search (DLS) • Iterative Deepening DFS (IDDFS) • Bidirectional Search Rather than simply computing the final path, the application demonstrates the complete step-by-step search process — showing frontier nodes, explored nodes, and final path reconstruction in real time. Collaborating on this project allowed us to move beyond theoretical concepts and truly understand how each algorithm behaves, including their trade-offs in optimality, completeness, speed, and memory usage. Working together on building and visualizing these algorithms made the learning process much more practical and engaging. 🔗 GitHub Repository: https://lnkd.in/d87XyHRz 📝 Medium Article: https://lnkd.in/d7aTKDG2 #ArtificialIntelligence #Python #SearchAlgorithms #ComputerScience #AIProjects #Collaboration #LearningByBuilding

Building an AI Pathfinder: Visualizing Uninformed Search Algorithms in a Grid (Tkinter + Python) medium.com
Like Comment
To view or add a comment, sign in
Michael Tchuindjang
2mo
Report this post
Andrej Karpathy Recreates GPT From Scratch with a Small Python Code Andrej Karpathy, a former researcher at OpenAI and the founder of AI-native education company Eureka Labs, has launched a new experimental project that distils the inner workings of a generative pre-trained transformer into a single, minimal Python file. The project, called microGPT, shows how a GPT-style language model can be trained and used for inference using only 243 lines of pure, dependency-free Python code—without PyTorch, TensorFlow, NumPy, or any external machine learning frameworks. He shared the project on GitHub and included an HTML file containing the code for a single web page. Users online reacted with widespread fascination and admiration, given that the project compresses an entire working GPT into just a few hundred lines of plain Python code.... Credits to AI & Data Insider by Staff Writer. [Source in the comments section]
13 Comments
Like Comment
To view or add a comment, sign in
Anjali Dubey
2mo
Report this post
AI Learning Series — Day 6 | Python Journey — Day 2 Here’s what Day 2 looked like: Today I focused on: • Functions • Lists vs Dictionaries • Basic problem-solving using Python Functions made me realize how important it is to write reusable and organized code instead of repeating the same logic again and again. While exploring lists and dictionaries, I started understanding how Python helps us store and structure data in different ways depending on the problem. What I enjoyed the most today was solving small problems using these concepts. Even simple exercises make you think differently and push you to approach problems step by step. One thing I'm learning in this journey: Progress in tech doesn’t come from huge breakthroughs. It comes from showing up daily and understanding one concept at a time. Still early in the journey, but the foundation is getting stronger. Day 2 done. On to Day 3. 🚀 Stay tuned. #BuildInPublic #AIJourney #PythonBeginner #Consistency #WomenInTech #TechGrowth #AI #AIAgents #LLM #GenAI #LangChain #LangGraph #Developers #Tech #FullStackDeveloper #Developers #Learning
Like Comment
To view or add a comment, sign in

2,863 followers

View Profile Connect

Karpathy's microGPT: Pure Python GPT Training and Inference

More from this author

The Last Human in the Loop Was Writing the Skill Files. Not Anymore.

Your Multi-Agent Topology Is a Frozen Assumption. HERA Isn't.

The Model Is Not the Bottleneck. The Harness Is.

Explore content categories

Karpathy's microGPT: Pure Python GPT Training and Inference

More Relevant Posts

More from this author

The Last Human in the Loop Was Writing the Skill Files. Not Anymore.

Your Multi-Agent Topology Is a Frozen Assumption. HERA Isn't.

The Model Is Not the Bottleneck. The Harness Is.

Explore related topics

Explore content categories