Agent based systems require bad python code. Because LLMs are (mostly) text in and text out, to have them interact with local functions, you often need to write them in ways you typically would not. When working with data, instead of returning the actual data object, depending on the workflow you will *need* to return the object in text. Another example is that instead of blocking a thread with an error, you will often want to capture the error and feed it back to the LLM. I show several examples of using agent based systems with data analysis type tasks in my book, LLMs for Mortals: A Practical Guide for Analysts, https://lnkd.in/enCZ_rM3. Agent based systems tend to be very complicated. If you want a basic introduction starting from tool calling in a loop, and then expanding into more complicated agent sdks (with examples in OpenAI, Anthropic, and Google), I recommend picking up a copy.
Your point about needing to serialize data for LLM interaction is spot on. We often found ourselves converting pandas DataFrames to JSON strings just to feed them back. Have you found any clever ways to optimize this serialization/deserialization overhead within agent loops?
Does your book address GIS data? I’m building ABM for predicting crime using qualitative data and mapping is a component.