The entire algorithm behind GPT fits in a single Python file. No PyTorch. No TensorFlow. No dependencies at all. Andrej Karpathy just dropped microgpt.py — a single-file, dependency-free implementation of GPT training and inference in pure Python. His tagline says it all: "The most atomic way to train and inference a GPT in pure, dependency-free Python. This file is the complete algorithm. Everything else is just efficiency." Let that sink in. Everything else is just efficiency. It's a hand-rolled autograd engine, attention mechanism, tokenizer, training loop, and text generator — all built with nothing but the Python standard library. And now there's an interactive educational visualization (by Tan Pue Kai) that lets you step through the computation graph and watch gradients flow in real time. Links to both in the comments. Here's what this actually teaches us. There's a narrative that AI is making software engineering obsolete. That anyone can "vibe code" their way to production. Karpathy's script is the perfect counterargument. It strips away every framework and abstraction, revealing what's actually happening. Matrix multiplications. Gradient chains. The math. When you remove the tools, what remains is understanding. The people who will thrive in the AI era aren't the ones who memorized API calls. They're the ones who know why the API works the way it does. Who can debug a training run by reasoning about the loss landscape, not Googling the error message. Those skills don't get automated away. They become more valuable as the tools get more powerful, because someone still needs to know when the tools are doing it wrong. The frameworks will change. The model architectures will change. The ability to decompose a system, understand it from the ground up, and build something better — that's permanent. Go read the code. Step through the visualization. The engineers who understand the machine will always be the ones steering it. #SoftwareEngineering #MachineLearning #AI #DeepLearning #FirstPrinciples #LLM #BuildInPublic
Karpathy's microGPT: Pure Python GPT Training and Inference
More Relevant Posts
-
PyTorch Serialization Methods: ******************************* PyTorch gives you four ways to serialize a model. They are not interchangeable. Each one makes a different assumption about the environment you are deploying into — and that assumption is invisible until it does not hold. 🔹 `state_dict` saves the weights only. The architecture class must exist in Python at load time. One update worth making in PyTorch 2.x: pass `weights_only=True` to `torch.load`— the default changed in PyTorch 2.x. 🔹 `TorchScript` serializes the architecture and the weights together. No Python class needed at inference. If your model has conditional branches, use `torch.jit.script` rather than `torch.jit.trace` — tracing bakes in one execution path silently. 🔹 ONNX steps outside the PyTorch ecosystem entirely. It targets Triton, TensorRT, and ONNX Runtime. Pin the opset version explicitly — a mismatch between the exporter and the opset is the most common ONNX failure in production. 🔹 `torch.export` is the PyTorch 2.x path. Full computation graph, no Python at runtime, serialised to .pt2. It requires a complete graph with no breaks — more restrictive than `torch.compile`, but portable and fully serialisable. The choice comes down to one question: what will you not control after training? ✔️ Research environment, stable codebase → state_dict ✔️ Remove the Python dependency → TorchScript ✔️ Serving outside PyTorch → ONNX ✔️ PyTorch-native AOT serving → torch.export Which format does your current inference pipeline use? #PyTorch #MLEngineering #MachineLearning #Python #AIEngineering
To view or add a comment, sign in
-
-
High traffic article: I wrote a ground-up guide to training an LLM using Python and PyTorch. No hand-waving, no "and then magic happens" explanations. We go through: → Data preparation and tokenization → Building the transformer architecture from scratch → Training loops, loss functions, and optimization → What actually happens inside the model during training If you've been using LLMs without really understanding how they work, this is the article to change that. Full guide: https://lnkd.in/g38fqAnu
To view or add a comment, sign in
-
https://lnkd.in/e83Pa4M8 Generative Pre-trained Transformers ( GPT ) made simple in Python. 1. Generate, ingest content -> corpora 2. Prediction, corpora -> next-tokens 3. Transformers, next tokens -> matrix math Self-supervised next token predictions. Tokens predict vector direction.
To view or add a comment, sign in
-
LLMs aren't magic. They're just 200 lines of Python. Andrej Karpathy just dropped microGPT, the "final boss" of his educational projects. It’s a complete GPT-2 implementation in 200 lines with zero dependencies. But how do you actually use it? Karpathy demonstrates its power by training it on a dataset of 32,000 names. By the end of the script, the model "hallucinates" brand-new, plausible names (like vialan or konna) by learning the statistical patterns of the alphabet. You can use microGPT to: 1. Learn the "Gears": Step through the Value class to see exactly how backpropagation calculates gradients. 2. Train on your own "Mini-Documents": Swap out the names for any small text dataset (like Pokémon names or CSS colors) and watch it learn to generate them. 3. Experiment with the KV Cache: See exactly how "token communication" works in the attention block without the bloat of PyTorch. As Karpathy says: "Everything else is just efficiency." Read the full guide: https://lnkd.in/dPaguY39 "What's the most complex thing you've ever seen simplified to its bare essentials?" Answer in comments. #AI #MachineLearning #Python #AndrejKarpathy #Education #LLM
To view or add a comment, sign in
-
-
Your AI agent can't watch YouTube. So I fixed that. Two open-source Python tools that let AI agents read and watch YouTube videos. youtube-transcript.py — Pull any transcript instantly. No API key. One pip install. youtube-watch.py — Download the video, extract frames at intervals, grab the transcript. Your agent gets sequential image + timestamped text — enough to actually see what happened. MIT licensed. Zero cost: https://lnkd.in/gEpiWw3s #AI #OpenSource #AIAgents #DeveloperTools #Python #ClaudeCode #BuildInPublic
To view or add a comment, sign in
-
🚀 Python Brain Teaser – Can you predict the output? During our AI & Data Analytics Sprint, we had this interesting question: Python Copy code d = {True: 'yes', 1: 'no', 1.0: 'maybe'} print(d) 🤔 What do you think the output will be? Most developers expect: Copy code {True: 'yes', 1: 'no', 1.0: 'maybe'} But the real output is: Copy code {True: 'maybe'} 🔥 Why? Because in Python: True == 1 → ✅ 1 == 1.0 → ✅ All three have the same hash value Dictionary keys must be unique So Python treats True, 1, and 1.0 as the same key. And since dictionaries keep the last assigned value, the final result becomes: Python Copy code {True: 'maybe'} 💡 Lesson learned: Understanding how Python handles equality and hashing is crucial — especially when working with data structures in AI, analytics, and backend systems. Subtle behaviors like this can cause silent logical bugs in real-world projects. 👇 Did you get it right before reading the explanation? #Python #AI #DataAnalytics #Coding #LearningEveryday #TechMindset
To view or add a comment, sign in
-
🚀 Built an Interactive AI Pathfinder in Python As part of our Artificial Intelligence coursework, my friend Nimrah Shahid and I developed a GUI-based pathfinding application that visualizes how different uninformed search algorithms explore a grid environment. The project implements: • Breadth-First Search (BFS) • Depth-First Search (DFS) • Uniform Cost Search (UCS) • Depth-Limited Search (DLS) • Iterative Deepening DFS (IDDFS) • Bidirectional Search Rather than simply computing the final path, the application demonstrates the complete step-by-step search process — showing frontier nodes, explored nodes, and final path reconstruction in real time. Collaborating on this project allowed us to move beyond theoretical concepts and truly understand how each algorithm behaves, including their trade-offs in optimality, completeness, speed, and memory usage. Working together on building and visualizing these algorithms made the learning process much more practical and engaging. 🔗 GitHub Repository: https://lnkd.in/d87XyHRz 📝 Medium Article: https://lnkd.in/d7aTKDG2 #ArtificialIntelligence #Python #SearchAlgorithms #ComputerScience #AIProjects #Collaboration #LearningByBuilding
To view or add a comment, sign in
-
Andrej Karpathy Recreates GPT From Scratch with a Small Python Code Andrej Karpathy, a former researcher at OpenAI and the founder of AI-native education company Eureka Labs, has launched a new experimental project that distils the inner workings of a generative pre-trained transformer into a single, minimal Python file. The project, called microGPT, shows how a GPT-style language model can be trained and used for inference using only 243 lines of pure, dependency-free Python code—without PyTorch, TensorFlow, NumPy, or any external machine learning frameworks. He shared the project on GitHub and included an HTML file containing the code for a single web page. Users online reacted with widespread fascination and admiration, given that the project compresses an entire working GPT into just a few hundred lines of plain Python code.... Credits to AI & Data Insider by Staff Writer. [Source in the comments section]
To view or add a comment, sign in
-
-
AI Learning Series — Day 6 | Python Journey — Day 2 Here’s what Day 2 looked like: Today I focused on: • Functions • Lists vs Dictionaries • Basic problem-solving using Python Functions made me realize how important it is to write reusable and organized code instead of repeating the same logic again and again. While exploring lists and dictionaries, I started understanding how Python helps us store and structure data in different ways depending on the problem. What I enjoyed the most today was solving small problems using these concepts. Even simple exercises make you think differently and push you to approach problems step by step. One thing I'm learning in this journey: Progress in tech doesn’t come from huge breakthroughs. It comes from showing up daily and understanding one concept at a time. Still early in the journey, but the foundation is getting stronger. Day 2 done. On to Day 3. 🚀 Stay tuned. #BuildInPublic #AIJourney #PythonBeginner #Consistency #WomenInTech #TechGrowth #AI #AIAgents #LLM #GenAI #LangChain #LangGraph #Developers #Tech #FullStackDeveloper #Developers #Learning
To view or add a comment, sign in
-
More from this author
Explore related topics
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
https://gist.github.com/karpathy/8627fe009c40f57531cb18360106ce95