Andrew Wheeler’s Post

2w Edited

Agent based systems require bad python code. Because LLMs are (mostly) text in and text out, to have them interact with local functions, you often need to write them in ways you typically would not. When working with data, instead of returning the actual data object, depending on the workflow you will *need* to return the object in text. Another example is that instead of blocking a thread with an error, you will often want to capture the error and feed it back to the LLM. I show several examples of using agent based systems with data analysis type tasks in my book, LLMs for Mortals: A Practical Guide for Analysts, https://lnkd.in/enCZ_rM3. Agent based systems tend to be very complicated. If you want a basic introduction starting from tool calling in a loop, and then expanding into more complicated agent sdks (with examples in OpenAI, Anthropic, and Google), I recommend picking up a copy.

6 Comments

Volkan Topalli 2w

Does your book address GIS data? I’m building ABM for predicting crime using qualitative data and mapping is a component.

2 Reactions

Farouk Hajjej 2w

Your point about needing to serialize data for LLM interaction is spot on. We often found ourselves converting pandas DataFrames to JSON strings just to feed them back. Have you found any clever ways to optimize this serialization/deserialization overhead within agent loops?

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Erica Ticianelli
6d
Report this post
🐍 𝗣𝘆𝘁𝗵𝗼𝗻 𝗮𝗰𝗰𝗲𝗽𝘁𝗲𝗱 𝗹𝗮𝘇𝘆 𝗶𝗺𝗽𝗼𝗿𝘁𝘀. 𝟰𝟱𝟬 𝗰𝗼𝗺𝗺𝗲𝗻𝘁𝘀. 𝟯 𝘆𝗲𝗮𝗿𝘀 𝗼𝗳 𝗳𝗶𝗴𝗵𝘁𝘀. 𝗔𝗻𝗱 𝘀𝗼𝗺𝗲𝗼𝗻𝗲 𝗮𝗹𝗿𝗲𝗮𝗱𝘆 𝗵𝗮𝘁𝗲𝘀 𝗶𝘁. If you use Python daily — data scripts, ML pipelines, internal tools, automations — you've probably stared at a terminal waiting for your script to start without knowing why. That 3-second pause? Often it's Python loading libraries you don't even use in that run. Lazy imports fix exactly that: Python only loads what it actually needs. Your tools start faster, your team waits less, your servers spend less on cold starts. You don't need to change how you think about Python. You just write lazy in front of one import line and the problem is gone. PEP 810 — explicit lazy imports (deferred module loading: Python skips loading a module until its name is first used in the code) — was unanimously accepted by the Steering Council in November 2025 and is now shipping in Python 3.15 alphas. The numbers back it up. Meta cut startup time by 70%. Hugo van Kemenade's CLI dropped from 2.4s to 0.7s — 3x faster — with a two-line change. Machine learning training initialization went from 15s to 9s in internal benchmarks. But one voice on Hacker News put it bluntly: "This will break tons of code and introduce a slew of footguns. Import statements fundamentally have side effects. When and how these side effects are applied will cause mysterious problems and breakages that will keep people up for many nights." That's not wrong. The tradeoff is real: you're trading deterministic startup behavior for performance. Lazy imports shift errors from load time to runtime. A missing module you currently catch at startup might now blow up at 2am under a specific code path in production. PEP 810 addresses this by making lazy loading strictly opt-in with the lazy soft keyword (syntax: lazy import module_name). You choose your surface, you own the risk. The community was split for 3 years for good reason. The fix is now in the language. Use it surgically. Source: https://lnkd.in/dnkyrHFS #Python #SoftwareEngineering #PEP810 #Python315

PEP 810 – Explicit lazy imports | peps.python.org peps.python.org

2 Comments
Like Comment
To view or add a comment, sign in
Absar Ishfaq
1w
Report this post
🚀 Day 9: File Handling in Python In real-world applications, data doesn’t just live in variables it is stored in files. 👉 That’s where File Handling comes in. Python allows us to create, read, update, and delete files easily. 🔹 Common File Operations: ✔ Read a file ✔ Write to a file ✔ Append data ✔ Close a file 💡 Example: Writing to a file with open("data.txt", "w") as file: file.write("Hello, Python!") Reading from a file with open("data.txt", "r") as file: content = file.read() print(content) 🔹 File Modes: ✔ "r" → Read ✔ "w" → Write (overwrites file) ✔ "a" → Append ✔ "b" → Binary mode 📌 Why it matters? File handling is used everywhere: ✔ Saving user data ✔ Logging system activities ✔ Working with reports (CSV, JSON) Without file handling, building real-world applications would be nearly impossible. 💡 Data is valuable knowing how to store and manage it is a key developer skill. 📈 Step by step, moving closer to real world development. #Python #Programming #Coding #Developers #BackendDevelopment #FileHandling #LearningJourney #Django
Like Comment
To view or add a comment, sign in
Munkh-Altai Purevdorj
3w
Report this post
Python: @staticmethod vs @classmethod (Explained Simply) In Python classes, not all methods behave the same. There are 3 types of methods: 1) Instance Method: Works with object data. def show_name(self): • Uses self. • Accesses instance variables. 2) Class Method (@classmethod): Works with class-level data. @classmethod • Uses cls. • Can modify class variables. • Shared across all objects. 3) Static Method (@staticmethod): Independent utility function. @staticmethod • No self, no cls. • Doesn’t modify class or instance. • Used for helper logic. In this example: • show_name() → works on object. • change_company() → updates company for all employees. • greet() → simple helper function. Think of it like this: - Instance → works with object. - Class → works with class. - Static → works independently. Comment down, Which one do you use most in your code?
Like Comment
To view or add a comment, sign in
Kandi Brian
1w
Report this post
Some python list tutorials stop at my_list.append(x). That is the surface. Underneath, a list is a C struct called PyListObject holding an array of pointers to PyObject instances. The list does not store your data. It stores references to wherever your data lives on the heap. That single fact is the root cause of the aliasing bugs that catch developers off guard. A few things that land differently once you understand the memory model: Why append() is O(1) amortized. CPython over-allocates on resize using the growth sequence 0, 4, 8, 16, 24, 32, 40, 52, 64, 76... so the O(n) copy cost spreads across many appends. Why b = a and then mutating b also mutates a. They are two names pointing at the same PyListObject. Why list.sort() runs in O(n) on nearly-sorted data. Timsort, written by Tim Peters in 2002, finds already-sorted runs and merges them. Stability has been a documented guarantee since Python 2.2. Why list.pop() from the end is O(1) but list.pop(0) is O(n). Elements after the index have to shift. I put together an 11-tutorial learning path on PythonCodeCrack that walks through lists from first principles through the copy semantics and aliasing patterns that cause hard-to-trace bugs. Fundamentals first (creation, slicing, append vs extend, sorting, comprehensions), then the advanced group (flattening, shallow vs deep copy, why your list keeps changing unexpectedly). https://lnkd.in/g5uUXj6d #Python #SoftwareEngineering #CPython #Programming

Python Lists Learning Path | PythonCodeCrack pythoncodecrack.com
Like Comment
To view or add a comment, sign in
Blessing Inuk
2w
Report this post
The most underrated skill in data isn't SQL. It isn't Python. It isn't even knowing how to build a dashboard That actually makes sense. It's knowing which question to answer. Most analysts answer the question they were give Not the one the business actually needs answered. And two weeks later the report sits unopened Because it solved a problem nobody was trying to fix. Before I start any analysis now I ask one thing: If this question gets answered perfectly, what changes? If the answer is nothing, that's not the right question yet. PS: Have you ever delivered an analysis that was technically correct but answered the wrong question entirely? What happened?
144 Comments
Like Comment
To view or add a comment, sign in
Volodymyr Sydorenko
1w Edited
Report this post
Level Up Your Python API Design: Mastering / and * Have you ever looked at a Python function signature and wondered what those / and * symbols actually do? While many developers stick to standard arguments, modern Python (3.8+) provides surgical precision over how functions receive data. Understanding this is key to building robust, self-documenting APIs. Check out this "Ultimate Signature" example: def foo(pos1, pos2, /, pos_or_kwd1, pos_or_kwd2='default', *args, kwd_only1, kwd_only2='default', **kwargs): print( f"pos1={pos1}", f"pos2={pos2}", f"kwd_only1={kwd_only1}", # ... and so on ) The Breakdown: Positional-Only (/): Everything to the left of the slash must be passed by position. You cannot call foo(pos1=1). This is perfect for performance and keeping your API flexible for future parameter renaming. Positional-or-Keyword: The "classic" Python parameters that can be passed either way. The Collector (*args): Grabs any extra positional arguments and packs them into a tuple. Keyword-Only: Everything after *args (or a standalone *) must be named explicitly. This prevents "magic number" bugs and makes the intent of the caller crystal clear. The Dictionary (**kwargs): Catches any remaining keyword arguments. Why should you care? Good code isn't just about making it work; it’s about making it hard to use incorrectly. By using these boundaries, you create a strict contract. You force clarity where it’s needed (Keyword-Only) and allow flexibility where it’s not (Positional-Only). Are you using these constraints in your daily development, or do you prefer keeping signatures simple? Let’s discuss below! 👇 #Python #SoftwareEngineering #CleanCode #Backend #ProgrammingTips #Python3 #CodingLife

5 Comments
Like Comment
To view or add a comment, sign in
Pritam Dodeja
3w
Report this post
Technical post: I've been posting some graphs on here, talking about functions and "equivalence". This was all started by working on porting an MLOPs framework from python 3.10 to 3.12, and all the "dependency hell" one has to go through. Then naturally the question arose "What are the boundaries of one project to another, in terms of functions being called etc.,?" This led me down the rabbit hole (not too deep) of what happens when I do something like python -m <module> <somescript>. Specifically, what is a "no op" module, and what kind of ops can we inject, thanks to python being an interpreted language. A few years ago I'd worked on something along similar lines called TracePath, which provided a decorator to do something similar (e.g. who called who, how long it took, etc.). So I merged these two ideas (avoid decorating every function, have an "inspector" module) and ran this on a simple pandas dataframe creation. The resulting function invocation graph is the image attached to this post. When I ran it across the whole workflow (create, load, transform data etc.,), the graph had ~9000 connections. The nice thing is I can specify which modules (e.g. only pandas, or pandas and numpy) should be added to the graph etc. What do you think is the next logical thing to do with something like this? What kind of graphs would well structured software produce? How about badly written software? #graphs #swe #dependencyhell #python
Like Comment
To view or add a comment, sign in
Zachary Miller
4d Edited
Report this post
I just shipped a new feature to pydepgate: partial decode. When the scanner finds a high-entropy encoded blob, it now attempts to decode it and show you what's inside - without executing any of it. Here's what that looks like against the litellm 1.82.8 wheel. The 34,460-character string in proxy/proxy_server[.]py that I flagged in my last post? pydepgate now decodes it automatically. One layer of base64, 34,460 characters down to 25,844 bytes. Final form: Python source code. What's in that Python source? The first thing the decoder sees is import subprocess. Then import tempfile. Then import os. Then a PEM public key block. That's a complete second-stage payload, encoded and sitting inside a production Python package used by thousands of developers. The outer package does its advertised job. The encoded blob waits. pydepgate didn't execute it. It didn't import it. It decoded the bytes statically, identified the content type, extracted the indicators, and showed you the hex. The tool's job is to tell you what's there before anything runs - and now it can tell you more precisely what "there" is. --peek to enable decoding. --peek-chain to follow multi-layer encoding if the first decode produces another encoded blob. Still zero dependencies. Still stdlib only. pydepgate 0.2.0 is on PyPI now. By the way there's over 700 unittests, this project is covered extensively.
Like Comment
To view or add a comment, sign in
Amit Kumar
2w
Report this post
Here’s a simple Python roadmap to follow: 🔹 Step 1: Basics Build your foundation → Syntax, variables, data types → Conditionals, functions, exceptions → Lists, tuples, dictionaries 🔹 Step 2: Object-Oriented Programming Think like a developer → Classes & objects → Inheritance → Methods 🔹 Step 3: Data Structures & Algorithms Level up problem-solving → Arrays, stacks, queues → Trees, recursion, sorting 🔹 Step 4: Choose Your Path This is where things get interesting → Web Development Django, Flask, FastAPI → Data Science / AI NumPy, Pandas, Scikit-learn, TensorFlow → Automation Web scraping, scripting, task automation 🔹 Step 5: Advanced Concepts → Generators, decorators, regex → Iterators, lambda functions 🔹 Step 6: Tools & Ecosystem → pip, conda, PyPI 💡 The truth? Python isn’t hard—lack of direction is.
Like Comment
To view or add a comment, sign in
Lucas Pereira
3w
Report this post
OrJSON looks like a small optimization. Until you realize how much time your API spends just serializing JSON. In many Python APIs, the bottleneck isn’t only the database or the LLM. Sometimes it’s the most invisible step: turning Python objects into JSON. What is OrJSON? A high-performance JSON library for Python, written in Rust. It replaces the default json module and focuses on one thing: speed. It: → serializes faster → deserializes faster → supports dataclass, datetime, numpy, UUID out of the box → returns bytes instead of str So what’s happening under the hood? The idea is simple: optimize the hottest path in your API. → less overhead per operation → less work per payload → faster UTF-8 writing And it shows. In its own benchmarks: → dumps() can be ~10x faster than json → loads() can be ~2x faster Where this actually matters: → large payloads → APIs returning a lot of JSON → RAG metadata, events, telemetry → long lists Now the part most people ignore: Trade-offs. → orjson.dumps() returns bytes, not str → no built-in file read/write helpers → not always a perfect drop-in replacement → holds the GIL during serialization So when should you use it? → large responses → heavy metadata → serialization shows up in profiling And when won’t it help? → DB is your bottleneck → LLM latency dominates → responses are small → network / I/O dominates OrJSON won’t magically make your API fast. But if serialization is on your hot path, it’s one of the highest ROI optimizations you can make.
Like Comment
To view or add a comment, sign in

2,994 followers

View Profile Connect

Andrew Wheeler’s Post

More from this author

Seven Best Practices for AI assisted Coding

Age effects are more likely than cohort effects

My data science journey, one blog at a time

Explore content categories