Python's Quiet Comeback: From Concurrency to AI-Native Applications

3w Edited

Story time. There was a phase when Python quietly stopped getting picked. Not because it disappeared. Not because people didn’t love it. But when the question was “what should we use for a serious backend?” — the answers were predictable. Node for async. Go for concurrency. Java for scale. Python? “Too slow.” “GIL issues.” “Not for production.” And to be fair — those criticisms weren’t wrong. The GIL wasn’t a bug. It was a design choice for safety. It ensured: memory consistency simpler garbage collection a stable C-extension ecosystem But the tradeoff was brutal: Only one thread could execute Python bytecode at a time. No true parallelism. People tried to “fix” it: joblib, threads, thread pools… But none of them actually removed the constraint. They just worked around it. Meanwhile, Go was doing real concurrency out of the box. Lightweight goroutines. Multi-core efficiency. If this was a race — Python wasn’t winning. But here’s the part most people miss: There was no rivalry. No “Python vs Go” war. Just a quiet shift in what the industry valued. While everyone was optimizing for speed, Python went somewhere else entirely. Data. Machine learning. AI. It didn’t try to win the same game. Then… the stack evolved. Async became usable. And a big unlock came in quietly: uvloop. A faster event loop that made Python’s async actually fast Lower latency. Better throughput. Real gains. But speed alone wasn’t enough. Enter FastAPI Not just a framework — but the missing piece that made everything click: Async-first by design Type-driven development Automatic docs Clean, production-ready APIs Now the stack looked like: async + uvloop + ASGI + FastAPI Not true parallelism. But extremely efficient I/O concurrency. And something shifted. Python didn’t need to beat Go at concurrency. It just needed to be good enough for the systems people were actually building. Then the real change happened. Backends stopped being just CRUD layers. They became: model serving systems Data pipelines AI-native applications And now the question wasn’t: “What’s the fastest language?” It was: “What fits the system end-to-end?” That’s when Python walked back in. Not as the fastest. Not as the best at concurrency. But as the most aligned. So no — Python didn’t beat Go. It just stopped playing the same game… and won a bigger one. Funny how a design choice made for safety… was once seen as a limitation — and later became irrelevant to the problems that mattered. #Python #FastAPI #uvloop #AI #Backend #SystemDesign

To view or add a comment, sign in

More Relevant Posts

Diganto Bhowmik
3w
Report this post
"Python is slow." Every developer has heard this. And technically, it's true. Pure Python loops are 50-100x slower than C/C++. That part is real. But here's what nobody tells you — Python doesn't do the heavy lifting. It tells C and Rust what to do. 𝗪𝗵𝗮𝘁 𝗮𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝗿𝘂𝗻𝘀 𝘂𝗻𝗱𝗲𝗿𝗻𝗲𝗮𝘁𝗵: → numpy.dot() → Intel MKL / OpenBLAS (C/Fortran) → torch.matmul() → cuBLAS (CUDA C++) → Pydantic v2 → pydantic-core (Rust) → Uvicorn HTTP → httptools (C) → orjson.dumps() → Rust JSON serializer → pandas.read_csv() → C parser Python is the steering wheel. The engine is C/Rust. 𝗧𝗵𝗲 𝗻𝘂𝗺𝗯𝗲𝗿𝘀 𝘁𝗵𝗮𝘁 𝗺𝗮𝘁𝘁𝗲𝗿: Matrix multiplication (1000x1000): • Pure Python → 450 seconds • C++ → 0.8 seconds • Python + NumPy → 0.03 seconds Read that again. Python + NumPy is 26x faster than raw C++ because it calls hand-tuned BLAS libraries with CPU SIMD optimization. ResNet-50 training on ImageNet: • PyTorch (Python) → 28 min/epoch • LibTorch (pure C++) → 27 min/epoch Same speed. Because Python is just orchestrating — the math runs in compiled CUDA kernels. 𝗔𝗣𝗜 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 (𝗿𝗲𝗾𝘂𝗲𝘀𝘁𝘀/𝘀𝗲𝗰): → Gin (Go) → 45,000 → Spring Boot (Java) → 18,000 → Express.js (Node) → 15,000 → FastAPI (Python) → 12,000-15,000 → Django (Python) → 1,200 → Rails (Ruby) → 900 → Laravel (PHP) → 800 FastAPI sits right next to Express and Spring Boot. 15x faster than Laravel. 𝗪𝗵𝘆 𝗻𝗼 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗰𝗮𝗻 𝘁𝗼𝘂𝗰𝗵 𝗣𝘆𝘁𝗵𝗼𝗻 𝗶𝗻 𝗠𝗟: → 500,000+ pre-trained models on HuggingFace → PyTorch, TensorFlow, JAX — all Python-first → Native GPU acceleration (CUDA) → Go has zero mature ML frameworks → PHP has zero ML frameworks → Java has DL4J... and that's it Even if Go is 3x faster at raw computation — 3x faster at nothing is still nothing. You'd spend 6 months building what Python gives you in one pip install. 𝗧𝗵𝗲 𝗯𝗼𝘁𝘁𝗼𝗺 𝗹𝗶𝗻𝗲: Python is slow. Python's ecosystem is not. And in 2026, the ecosystem is what ships products — nobody writes matrix math by hand anymore. #Python #FastAPI #MachineLearning #SoftwareEngineering #WebDevelopment #AI
Like Comment
To view or add a comment, sign in
BARATHKUMAR G
1w Edited
Report this post
🐍 Python Concurrency: Stop guessing, start choosing! Threading vs Async vs Multiprocessing - when to use what? I see devs pick these at random. Here's the mental model that changed how I write production Python. 👇 ━━━━━━━━━━━━━━━━━━━━ ⚡ MULTITHREADING - Best for I/O-bound tasks (file reads, DB queries, network calls) Due to the GIL, threads don't run in true parallel for CPU tasks - but they shine when your code is waiting on I/O. from concurrent.futures import ThreadPoolExecutor import requests urls = ["https://lnkd.in/gwfCxrVP", "https://lnkd.in/gEWYHnaM"] def fetch(url): return requests.get(url).json() with ThreadPoolExecutor(max_workers=5) as ex: results = list(ex.map(fetch, urls)) # Production use: scraping APIs, bulk DB inserts, reading files concurrently ━━━━━━━━━━━━━━━━━━━━ 🔄 ASYNC/AWAIT - Best for high-concurrency I/O (1000s of simultaneous connections, real-time apps) Single-threaded, event-loop driven. No thread overhead. Perfect when you have massive I/O concurrency but each task is lightweight. import asyncio import aiohttp async def fetch(session, url): async with session.get(url) as r: return await r.json() async def main(urls): async with aiohttp.ClientSession() as session: tasks = [fetch(session, u) for u in urls] return await asyncio.gather(*tasks) # Production use: WebSocket servers, FastAPI, real-time pipelines ━━━━━━━━━━━━━━━━━━━━ 🚀 MULTIPROCESSING - Best for CPU-bound tasks (data crunching, ML training, image processing) Bypasses the GIL completely. Each process gets its own memory. True parallelism on multi-core machines. from multiprocessing import Pool def crunch(data_chunk): return sum(x**2 for x in data_chunk) data = list(range(10_000_000)) chunks = [data[i::4] for i in range(4)] with Pool(processes=4) as pool: results = pool.map(crunch, chunks) # Production use: ML preprocessing, image resizing, scientific computing ━━━━━━━━━━━━━━━━━━━━ 🎯 Quick decision guide: • Waiting on network/disk? → Threading or Async • 1000+ concurrent connections? → Async • Heavy CPU computation? → Multiprocessing • Mixing both? → Async + ProcessPoolExecutor 💡 Pro tip: FastAPI + asyncio + Celery workers (multiprocessing) is the production stack for 90% of data-heavy Python backends. The best engineers don't memorize syntax - they understand the trade-offs. 🔑 What's your go-to concurrency pattern? Drop it below 👇 #Python #SoftwareEngineering #Backend #Programming #AsyncPython #PythonDev
Like Comment
To view or add a comment, sign in
Anand Yadav
1w
Report this post
Python for AI Systems: Why Python + FastAPI is my default for AI backend services in 2025. I've built backends in Java (Spring Boot), PHP (Laravel), Node.js, and Python. Here's when I reach for each: For AI/LLM workloads → Python + FastAPI. Always. Here's why: FastAPI is genuinely fast-: Async by default, built on Starlette. Handles concurrent LLM calls without thread management headaches. AI ecosystem lives in Python: LangChain, LangGraph, OpenAI SDK, HuggingFace — all Python first. No wrappers, no translation layers. Pydantic = free input validation: Define your schema once, get validation + docs + serialization. Critical when LLM outputs need strict structure. Background tasks built-in: Streaming LLM responses + async background processing without a separate worker framework. Easy integration with data tools: Pandas, Airflow, SQLAlchemy — your AI service can talk to your data layer without impedance mismatch. Java Spring Boot is still my go-to for transactional enterprise systems. But for AI services? FastAPI + Python + Docker on AWS ECS = fastest path to production-ready AI endpoints. What's your preferred stack for AI backend services? #Python #FastAPI #LLM #AIEngineering #BackendDevelopment #AWS
Like Comment
To view or add a comment, sign in
Rajin Uddin
1w
Report this post
Day 9/90: Rust Enums Just Broke My Brain (In a Good Way) 🎲 Thought I knew what enums were from Python. I was wrong. Python enum: class Status(Enum): PENDING = 1 APPROVED = 2 REJECTED = 3 Just named constants. Boring. Rust enum: enum Status { Pending, Approved(String), // Can hold DATA Rejected { reason: String, code: u32 }, } Wait. WHAT? Enums can HOLD DATA. Each variant can be different. This changes everything. Real example I built today: enum PaymentMethod { Cash, CreditCard { number: String, cvv: u16 }, Crypto { wallet: String, coin: String }, } One type, multiple shapes. The compiler forces you to handle ALL cases. In Python/JS I'd use inheritance or dicts with a "type" field. Always worried I'd miss an edge case. In Rust? Compiler says "you forgot CreditCard case" and refuses to compile. Here's the mind-blowing part - Option and Result are just enums: enum Option<T> { Some(T), None, } enum Result<T, E> { Ok(T), Err(E), } No null. No exceptions. Just explicit data that can be one of several variants. This is called algebraic data types. Sounds fancy but it's just "enums that can hold different data per variant." Real talk: First hour I was confused. "Why not just use a struct?" Then I tried handling payment methods and it clicked. One function parameter that can be cash OR credit card OR crypto, and the compiler ensures I handle all three. In my CSV processing work, I have different record types (header, data, footer). Been using dicts with "type" keys. One typo and runtime error. With Rust enums? Compile error if I forget a case. Zero runtime surprises. --- 💡 TL;DR: - Enums in Rust can hold data (not just constants) - Each variant can have different types - Compiler enforces exhaustive handling - Option/Result are built on this pattern - Way more powerful than Python/JS enums Day 9/90 ✅ 🔗 Code: https://lnkd.in/eKBGKPbC #RustLang #LearnInPublic #100DaysOfCode #TypeSafety #AlgebraicDataTypes Have you used enums that hold data before? Which language? 👇
Like Comment
To view or add a comment, sign in
Adam Rickards
3w
Report this post
I can't believe that I spent all my hobby time over the past two weeks and I'm only 'almost' finished. V2.0 of my napalm-hios was a migration from Python to YAML driven Python. Meaning the Python was supposed to execute but the YAML decided everything. The catch is that I don't really know how to code so I was reliant on AI interpreting my desired outcome and going from there. The outcome? Sorta kinda right. Not really. A for effort Gemini, let's say C for result. V2.5 of my napalm-hios was a migration from YAML driven Python to uh, YAML driven Python. We refactored it so more things which were 'decided in python' were 'decided in YAML'. The CRUDE Matrix is born removing many specific data transition types and creating bidirectional crude transforms that take arguments and are defined in YAML. V2.6 of my napalm-hios was ... we missed some stuff in V2.5, my fault, I trusted AI. Never again (also again, many times). V2.7 of my napalm-hios introducing the pipeline concept in the engine with bidirectionality. Ingress/Egress (towards the 'engine/device' as ingress, towards the 'engine/user' as egress) allows for functions that have similar needs to take advantage of less code and more YAML driven declarations via the CRUDE matrix. GATEs are born here to validate (loosely) to prevent invalid YAML requests into the engine that might cause crashes. V2.8 of my napalm-hios is rewriting the python structure around the pipeline, one executor, three types of tasks, bidirectional, YAML driven. It's beginning to look like a real software project. Claude tries to stuff less if/then/else than before, I am happy. It's not zero, I am sad. V2.9 it's now two projects. napalm-hios is an adapter/shim that loads CRUDE-engine. CRUDE-engine is fully independent and the gate system morphed into "you cant go from here to there, unless you use the GATE" and this is enforced because nobody has context to do anything unless that context comes from the GATE. For some tasks where validation is not required but context is we pass "validate=False" to the GATE to get return context. V2.9 is currently almost finished, will be shipped 'when its ready', 55/70 CLI getters because somewhere along the way I forgot that this was a napalm-driver and that CLI was supposed to be a first class citizen and I put it into the 'do it later' basket for all of V2.x. And to be honest, I am not sure it'll ever have full CLI coverage, it will have it for the 'standard napalm functions' but it's really just not that relevant. What a lot of nonesense I just wrote. Well the lesson was that software development in my mind was always some kind of 'linear operation' - you make a plan, you write the software, you eliminate bugs in testing and you ship it. This software did not work that way, is that because I am a bad software developer? Whilst this is true for sure due to lack of experience doing such I don't think thats why it happened this way. Is all software development a form of Circular Causation?
Like Comment
To view or add a comment, sign in
Darren BLUM
3w
Report this post
UNLEASHED THE PYTHON!i 1.5,2,& three!!! Nice and easy with a Python API wrapper for rapid integration into any pipeline then good old fashion swift kick in the header-only C++ core for speed. STRIKE WITH AIM FIRST ; THEN SPEED!! NO MERCY!!! 4 of 14 Are you Ready!?i Y.E.S!!!iii copy and paste Ai Theoretical integrity meets practical performance. To ensure no two data points collide (mathematical proof) while maintaining high computational speed, the key is to confirm your sequence is "coprime" or that your multiplier (like the 1.5 or 3 ratios) doesn't prematurely collapse the cycle before hitting your 41 or 123 limit. Since you've already mapped the ratios out to several decimal places (like 1.421 and 4.862 figures), you're likely checking for bit-level precision to make sure the rounding doesn't drift during high-speed execution. Since I’m tackling both the stress-testing and the coding logic simultaneously, you’re likely looking to see how that 41-based loop handles the "drift" that can happen during millions of rapid-fire calculations. Using a language like C++ would give the raw speed needed for real-time data streams, while Python would be for quickly verifying the mathematical proof holds up under pressure. The goal is to make sure geometric growth (1.5, 2, 3) hits that reset point perfectly every single time without losing a single decimal of precision. So changing theory to standalone library for others means i’m moving from personal math exploration to building reusable utility for the developer community. Packaging 123/41-based ratios & cyclic growth model into a library, essentially i’m providing a "black box" where a user can feed in a data stream & get back a mathematically synchronized, encrypted, or indexed output. The efficiency of using geometric scaling (1.5, 2, 3) for the growth & modular resets for the loop will make it attractive for high-performance applications. So the goal is ease of use first for beginners like myself & then provide speed to attract other developers plus making application practical. Make sense? No? Join the crowd! By prioritizing API hooks, you're making it "plug-and-play" for other developers. They can drop your 123/41-based logic into their existing data pipelines without needing to understand all complex geometric scaling (the 1.5, 2, & 3 ratios) happening under the hood. The command-line tool then becomes perfect secondary feature for anyone who just wants to run a quick test on a single value or verify the reset point. Starting with a Python wrapper is the best way to nail ease of use—it allows other users to import your 123/41 logic with a single line of code & start piping their data through geometric scaling immediately. Once interface is solid, you can optimize "engine" in C++ or Rust to handle the speed requirements. This "Python-on-top, C++-underneath" approach is exactly how major libraries like NumPy or TensorFlow stay both user-friendly & incredibly fast. 4 of 14
Like Comment
To view or add a comment, sign in
Yukesh A S
2w
Report this post
I was going through the Python 3.15 release notes recently, and it’s interesting how this version focuses less on hype and more on fixing real-world developer pain points. Full details here: https://lnkd.in/gSvcuvWg Here’s what stood out to me, with practical examples: --- Explicit lazy imports (PEP 810) Problem: Your app takes forever to start because it imports everything upfront. Example: A CLI tool importing pandas, numpy, etc. even when not needed. With lazy imports: import pandas as pd # only loaded when actually used Result: Faster startup time, especially for large apps and microservices. --- "frozendict" (immutable dictionary) Problem: Configs get accidentally modified somewhere deep in your code. Example: from collections import frozendict config = frozendict({"env": "prod"}) config["env"] = "dev" # error Result: Safer configs, better caching keys, fewer “who changed this?” moments. --- High-frequency sampling profiler (PEP 799) Problem: Profiling slows your app so much that results feel unreliable. Example: You’re debugging a slow API in production. Result: You can profile real workloads without significantly impacting performance. --- Typing improvements Problem: Type hints get messy in large codebases. Example: from typing import TypedDict class User(TypedDict): id: int name: str Result: Cleaner type definitions, better maintainability, stronger IDE support. --- Unpacking in comprehensions Problem: Transforming nested data gets verbose. Example: data = [{"a": 1}, {"b": 2}] merged = {k: v for d in data for k, v in d.items()} Result: More concise and readable transformations. --- UTF-8 as default encoding (PEP 686) Problem: Code behaves differently across environments. Result: More predictable behavior across systems, fewer encoding-related bugs. --- Performance improvements Real world impact: Faster APIs, quicker scripts, and better resource utilization. --- Big takeaway: Python 3.15 is all about practical improvements: - Faster startup - Safer data handling - Better debugging - More predictable behavior Still in alpha, so not production-ready. But it clearly shows where Python is heading. #Python #Backend #SoftwareEngineering #Developers #DataEngineering
Like Comment
To view or add a comment, sign in
Leon Maurice Browne
1w
Report this post
A tiny public anti-drift example in Python. Not proprietary. Not platform-specific. Just the basic law: - hash truth surfaces - classify write targets - deny runtime writes to canonical state - write mutable state atomically A lot of reliability problems become easier once you stop asking “how do we clean this up?” and start asking “why is live mutation happening on a truth surface at all?” That question scales far better than cleanup scripts. #Python #Reliability #SystemsDesign #AIGovernance 🎁 more gifts from __future__ import annotations from pathlib import Path import hashlib import json import os import tempfile def sha256_bytes(data: bytes) -> str: return hashlib.sha256(data).hexdigest() def detect_drift(canonical_path: Path, runtime_path: Path) -> dict: canonical = canonical_path.read_bytes() runtime = runtime_path.read_bytes() return { "canonical_sha256": sha256_bytes(canonical), "runtime_sha256": sha256_bytes(runtime), "match": canonical == runtime, } def decide_write(path_class: str, actor_type: str) -> str: """ path_class: canonical | runtime | derived | local actor_type: human | runtime | ci """ if path_class == "canonical" and actor_type == "runtime": return "deny" if path_class == "runtime" and actor_type in {"runtime", "ci"}: return "allow" if path_class == "derived" and actor_type in {"runtime", "ci"}: return "allow" return "review" def atomic_write_json(path: Path, payload: dict) -> None: path.parent.mkdir(parents=True, exist_ok=True) raw = json.dumps(payload, indent=2, sort_keys=True) + "\n" fd, tmp_name = tempfile.mkstemp(prefix=".tmp_", dir=str(path.parent)) try: with os.fdopen(fd, "w", encoding="utf-8") as fh: fh.write(raw) os.replace(tmp_name, path) except Exception: try: os.unlink(tmp_name) except FileNotFoundError: pass raise if __name__ == "__main__": canonical = Path("config/policy.json") runtime = Path("var/runtime/policy_runtime.json") result = detect_drift(canonical, runtime) print(result) decision = decide_write(path_class="canonical", actor_type="runtime") print({"write_decision": decision})
Like Comment
To view or add a comment, sign in
Samarjeeth R.
2w
Report this post
Python is too slow for high-frequency backtesting. So, I ripped out the math layer and rewrote it in optimized C++17. I recently completed the core development of NSEAlphaFinder, a high-frequency backtesting engine The primary constraint in algorithmic backtesting is the compute bottleneck when iterating through millions of historical price bars and calculating continuous risk vectors. Python is excellent for routing, but iterating through standard deviation arrays or computing rolling covariances natively is a massive performance drag. To solve this, I decoupled the architecture into two dedicated layers using pybind11 as the bridge. The Tech Stack: ✦ Core Engine: Modern C++17 compiled with -03 and -march=native. ✦ Parallelization: OpenMP alongside SIMD/AVX instructions for multi-threaded math operations. ✦ API / Routing: Python 3 and FastAPI for a lightweight REST interface. ✦ Build System: CMake and PowerShell for cross-platform deterministic builds. Core Project Features & Quant Mechanics: ✦ Mathematical Vectorization: Rolling metrics like Bollinger standard deviations and MACD histograms are computed in strict O(N) time using continuous accumulation, side-stepping naive O(N*K) windowing lags. ✦ Low-Latency Compute: Computes SMA, EMA, RSI, MACD, and Bollinger Bands for 1,000,000 price bars in under 50 milliseconds (translating to a latency of ~45ns per bar). ✦ Dynamic Data Ingestion: A strict OHLCV parser that normalizes timestamps, validates data consistency (ensuring High ≥ Low), gracefully applies forward fill matrix transformations on missing tick data before it hits the memory arrays. ✦ Full Strategy Backtester: Executes deterministic, long-only backtests directly in C++. It mathematically evaluates trade arrays to generate aggregate performance metrics calculating exact Sharpe logic (E[R] / σ * √252) , tracking compounded max drawdowns, and discounting institutional broker transaction costs. ✦Memory Safety: The C++ layer completely avoids raw pointers. Everything is handled via contiguous standard vectors and zero-copy references to ensure CPU cache-line hits are maximized and memory leaks eliminated. I enforced strict PEP 8 compliance across the Python bindings, replaced bare exception handling with targeted error tracing, and ensured the architecture adheres to SOLID design principles. github repo: https://shorturl.at/V3LLh The resulting system is highly modular. You can send a CSV of pricing data directly to the server, and the C++ binary will hot-load it, run a 5-indicator overlay, resolve target trading signals, and map out the execution Sharpe metrics in a fraction of a second. If you work on quantitative infrastructure, order management systems, or low-latency C++, I’d be interested to hear how you structure your memory mapping and vector computations at scale. #quantitativeanalysis #algorithmictrading #hft #cpp #lowlatency #marketmicrostructure #quantitativefinance #cplusplus #fintech #systemdesign #softwareengineering

2 Comments
Like Comment
To view or add a comment, sign in
Revathi Pathuri
1w
Report this post
Node or Python for Lambda. I have used both. Here is the honest answer. I am a Node person. Always have been. So when I first started writing Lambda functions Node was the obvious pick. Familiar syntax, fast cold starts, same language as the rest of my stack. Made sense. Then projects started pushing me toward Python. And you know what. It also made sense. Here is how I actually think about it now. If my Lambda is doing API work, event processing, or anything that sits close to a JavaScript frontend or Node backend I reach for Node. The cold start performance is better for latency sensitive functions and keeping the language consistent across the stack reduces the mental overhead. If my Lambda is doing anything data heavy, touching AI models, or working with Python libraries that simply do not exist in the Node ecosystem I reach for Python. The tooling is just better for that kind of work. Trying to force Node into an AI or data pipeline feels like fighting the current. The honest answer is neither runtime is the right answer every time. The use case picks the runtime. Not the other way around. I still lean Node when it is a coin flip. But I have stopped pretending it wins everywhere. Anyways that's my two cents. What do you default to for Lambda and has a project ever made you switch? #AWS #Lambda #NodeJS #Python #Serverless #FullStack #CloudArchitecture #TechLead #Sydney

3 Comments
Like Comment
To view or add a comment, sign in

2,910 followers

41 Posts

View Profile Follow

Python's Quiet Comeback: From Concurrency to AI-Native Applications

More Relevant Posts

Explore content categories