Pascal Biese’s Post

Most agent frameworks tightly couple workflow logic with Python code. AgentSPEX is a dedicated specification language for LLM-agent workflows. Instead of burying control flow inside Python scripts, AgentSPEX makes it explicit. Typed steps, branching, loops, parallel execution, reusable submodules, and state management all live in a readable spec - separate from the execution layer. The agent harness underneath handles tool access, sandboxed environments, checkpointing, and verification. It's the difference between editing a blueprint and rewiring a building. The team evaluated AgentSPEX across 7 benchmarks and ran a user study comparing it against a popular existing agent framework. Users found AgentSPEX workflows significantly more interpretable and accessible to author. The project also ships with ready-to-use agents for deep research and scientific research tasks, plus a visual editor that synchronizes graph and workflow views in real time. The practical upside here is maintainability. Current orchestration tools like LangGraph, DSPy, and CrewAI give you structure, but modifying a workflow still means modifying code. A dedicated spec language means non-engineers can inspect, edit, and verify agent behavior without touching the runtime. The real question: will teams adopt a new language when Python already works? If the interpretability gains hold up in production, the answer might be yes - especially when debugging a failing 15-step agent pipeline at 2 AM. ↓ 𝐖𝐚𝐧𝐭 𝐭𝐨 𝐤𝐞𝐞𝐩 𝐮𝐩? Join my newsletter with 50k+ readers and be the first to learn about the latest AI research: llmwatch.com 💡

5 Comments

Pascal Biese 1w

arXiv: https://arxiv.org/abs/2604.13346

2 Reactions

Axle Bucamp 2h

nice thx, normalizing it might help globably gaining perf

Awais Naeem 1w

The gap between 'Python-embedded workflows' and 'dedicated spec language' is the gap between engineer only maintenance and cross-functional maintainability. AgentSPEX makes workflows readable by non engineers. The 7 benchmarks and user study show it works. The ready to use agents for deep research and scientific research are the proof. The teams that adopt this will have agents that are easier to debug and modify.

Paul Iusztin

Senior AI Engineer • Founder @ Decoding AI • Author @ LLM Engineer’s Handbook ~ I ship AI products and teach you about the process.

On adoption, I think teams will only switch if the spec layer proves easier to debug and evolve than code, otherwise Python inertia wins.

George Juraj Salapa 1w

One thing that would hit me is models are trained on .py, .tsx, etc. etc., i.e. common languages .. isnt this thus going completely against it(e.g. TOON vs. json a misstep imo)

See more comments

To view or add a comment, sign in

More Relevant Posts

OptiRefine

34 followers
1w
Report this post
Exciting news on the horizon! Introducing OptiScan — OptiRefine's Python AST-powered code analysis engine. This tool focuses on deterministic static analysis that interprets your code in the same manner as a compiler, without any AI or guesswork. Here’s what OptiScan offers: - **Big-O Complexity Analysis**: It parses your Python source into an Abstract Syntax Tree (AST) to assess real-time and space complexity, identifying nested loops, O(n²) patterns, and exponential bottlenecks before they reach production. - **Automated Refactoring Engine**: Beyond flagging issues, it rewrites them. It includes HashSet pattern injection, N+1 ORM query resolution via .select_related(), and async httpx rewrites, along with Counter-based frequency optimizations, all generated programmatically. - **Cyclomatic Complexity Scoring**: Each function in your codebase receives a score based on decision-point density: LOW (1–5), MEDIUM (6–10), HIGH (11+). This allows you to pinpoint functions that may become unmaintainable ahead of your next code review. - **Dead Code Detection**: A two-pass AST scan uncovers functions and variables that are defined but never referenced, resulting in cleaner binaries and reduced cognitive load. - **Memory & Resource Auditing**: It identifies unclosed file handles, unbounded list accumulation in loops, and unnecessary generator expression materialization, catching patterns that lead to silent memory growth in long-running services. - **DevSecOps Static Analysis**: Hardcoded secrets, unsafe eval()/exec() calls, and risky module imports are flagged automatically before reaching a PR review. - **In-Browser PyTest Generation**: OptiScan generates PyTest scaffolding for every detected function and executes them directly in your browser via WebAssembly (Pyodide), requiring zero setup and no local environment. OptiScan is built on libcst, a concrete syntax tree library that ensures full, loss
Like Comment
To view or add a comment, sign in
Jimi Vaubien
2w
Report this post
A Python interpreter written in Rust. Under 1 microsecond startup. Pydantic just shipped Monty, and it's exactly the sandbox AI agents need. 𝗧𝗵𝗲 𝗽𝗿𝗼𝗯𝗹𝗲𝗺 When an LLM generates Python code, you need to run it somewhere safe. Current options: spin up a Docker container (slow), use a VM (heavy), or just run it and pray (please don't). Monty is a minimal Python interpreter built in Rust, designed specifically for executing LLM-generated code. 𝗪𝗵𝗮𝘁 𝗺𝗮𝗸𝗲𝘀 𝗶𝘁 𝗰𝗼𝗼𝗹 👉🏽 0.06ms startup (microsecond-scale, not second-scale) 👉🏽 No filesystem access unless you explicitly grant it 👉🏽 No network calls without authorization 👉🏽 Preset resource limits (execution time, memory, stack depth) 👉🏽 Runs in WebAssembly 👉🏽 ~4.5MB download size 𝗛𝗼𝘄 𝗶𝘁 𝘄𝗼𝗿𝗸𝘀 The interpreter pauses when it hits an external function call. Your host code decides whether to execute it and passes the result back. The LLM writes Python that calls your tools as regular functions, instead of going through the usual tool-call dance. This is already powering "code mode" in Pydantic AI, where the model writes Python calling tools as functions rather than making sequential tool calls. 𝗚𝗲𝘁𝘁𝗶𝗻𝗴 𝘀𝘁𝗮𝗿𝘁𝗲𝗱 👉🏽 𝘶𝘷 𝘢𝘥𝘥 𝘱𝘺𝘥𝘢𝘯𝘵𝘪𝘤-𝘮𝘰𝘯𝘵𝘺 Supports a solid subset of Python: asyncio, re, datetime, json, dataclasses. No class definitions yet, but enough for most agent tasks. 𝘞𝘩𝘢𝘵'𝘴 𝘺𝘰𝘶𝘳 𝘤𝘶𝘳𝘳𝘦𝘯𝘵 𝘴𝘢𝘯𝘥𝘣𝘰𝘹 𝘧𝘰𝘳 𝘳𝘶𝘯𝘯𝘪𝘯𝘨 𝘈𝘐-𝘨𝘦𝘯𝘦𝘳𝘢𝘵𝘦𝘥 𝘤𝘰𝘥𝘦?
12 Comments
Like Comment
To view or add a comment, sign in
Bernhard Müller
3w
Report this post
I switched from n8n to Python + Claude Code mid-project. Best call I made all quarter. Here's the honest comparison. n8n is not the automation tool you think it is. It's perfect for 3-step workflows. It becomes a debugging nightmare past that. I've built workflows in both — here's the honest breakdown. n8n wins when: → The workflow is small (under 5 nodes) → Speed to first result matters more than everything → The person building it isn't a developer But complexity changes the math fast. A 20-node workflow breaks. You open the visual editor to find the problem. Half your afternoon is gone. And the AI token cost while building medium to large flows? Every tweak, every node adjustment burns more than you'd expect. It compounds quietly. That's where OpenClaw(or Claude Code) + Python changes everything. For medium to large workflows: → Debugging is just reading code — no visual maze → Building is faster, less back-and-forth with AI → Token usage drops significantly The visual layer feels like a feature when you start. It becomes friction when the workflow grows. Code doesn't have that problem. My rule now: → Quick, simple automations → n8n → Everything from medium up → Python + Claude Code (And I am NOT a Python Developer! I just can understand the generated code. But that is not the point. I just have to specify what I want and if anything breaks have to say what broke and how it is supposed to be. On the other hand, with n8n debugging is a nightmare! Try it out!!! The tool you prototype with isn't always the one you should scale with. Follow me for more honest takes on AI tooling. What's your experience been? Drop your thoughts below.

1 Comment
Like Comment
To view or add a comment, sign in
Yuvrajsinh Jhala
1w
Report this post
Build your first AI agent with Python and Claude Learn to build an AI agent with Python and Claude that uses tools, makes decisions, and executes multi-step tasks autonomously. Read the full post 👇 https://lnkd.in/ghMddf5c #GenerativeAI #AI #WebDevelopment #PHP #Python #Developer #LLM

Build your first AI agent with Python and Claude imyuvii.com
Like Comment
To view or add a comment, sign in
Shivam Chaturvedi
3w
Report this post
You don’t need to memorize Python. You need a way to recall it when it actually matters. That’s where a solid Python cheat sheet becomes powerful. Because in real work, nobody remembers everything. What matters is how quickly you can think → apply → deliver. Here’s a simple Python cheat framework I rely on: • 𝗗𝗮𝘁𝗮 𝗧𝘆𝗽𝗲𝘀 → Don’t just use them — choose them wisely (list vs set vs dict matters) • 𝗦𝘁𝗿𝗶𝗻𝗴𝘀 → Slicing + formatting = everyday productivity boost • 𝗟𝗼𝗼𝗽𝘀 → Clean iteration with for, while, and enumerate() • 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝘀 → Write once, reuse everywhere • 𝗖𝗼𝗺𝗽𝗿𝗲𝗵𝗲𝗻𝘀𝗶𝗼𝗻𝘀 → Less code, better clarity • 𝗘𝘅𝗰𝗲𝗽𝘁𝗶𝗼𝗻𝘀 → Code should fail gracefully, not silently • 𝗕𝘂𝗶𝗹𝘁-𝗶𝗻 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝘀 → len(), sum(), min(), max() save more time than you think The real shift most people miss: It’s not about syntax. It’s about understanding how data flows through your code. That’s the difference between someone who learned Python and someone who can actually use it under pressure. Follow Shivam Chaturvedi for more content on QA, automation, and practical tech learning

1 Comment
Like Comment
To view or add a comment, sign in
Michal Gomez-Cornejo
3w
Report this post
A lot of AI work around documents is already happening in Python. That’s a big reason Nutrient’s new Python SDK matters. If you’re building an app where a model needs to interact with documents, the model is only part of the equation. OCR, data extraction, conversion, and structured output are what make those workflows usable in production. This release gives teams a stronger foundation for building document-heavy AI applications in Python. #Python #AI #DocumentProcessing #OCR #DataExtraction

Nutrient Python SDK: Production-grade document processing for Python nutrient.io
Like Comment
To view or add a comment, sign in
Kelly Gold
3w Edited
Report this post
Wrote code in Sublime, just regular old Python autocomplete and sytnax highlighting. Asked claude to challenge ME to write some code (JS world ~10 years, not much python since college). When I didn't know something, I looked at documenation. I'll tell ya what, my solution was 💩 But it worked. I had fun. I felt achievement. I was reminded in Python dict on a list of 2-item lists it pops them out as nice key-value pairs. Re-learned that line.split() works on any amount of whitespace in strings, nothing like that in JS. List comprehension after I submitted my dooky solution. Practicing string manipulation and data structures feels important.... How else do we learn to put better things in and get better things out? Maybe with future models -- it doesn't matter, the AI is just better than you at everything from database design to dev ops. For now, I think there's still a reason to hone your craft, and a reason they have a bunch of PHDs building these models.
Like Comment
To view or add a comment, sign in
Benjamin Bennett Alexander
4w
Report this post
⛔ If You're New To Python, Please Stop this! 💨 Do NOT modify a list while iterating over it. When you modify a list while iterating over it, the iterator gets confused. It doesn't know that the list has changed under its feet. For example, this code below is broken: items = [1, 2, 2, 3, 4] for item in items: if item == 2: items.remove(item) print(items) # Output: [1, 2, 3, 4] Here we use the remove() method to remove 2s from the list. But if you look at the output, 2 is still there. When you remove an item from a list, the list shifts left. But the loop keeps moving forward. So here’s what really happens: the first 2 are found and removed. The second 2 shifts into its position. The loop advances to the next index, and the 2 gets skipped. The best way to do it is to iterate over a copy: for item in items[:]: if item == 2: items.remove(item) print(items) # Output: [1, 3, 4] When you iterate over a copy (shallow copy), the original copy index remains unchanged. Even better, you can use list comprehension: items = [x for x in items if x != 2] 👑 Never, ever modify a list (or any collection) while iterating over it directly. The iterator doesn't handle structural changes gracefully. It will skip elements, process the same element twice, or raise a RuntimeError (in some cases, like dictionaries). It's a bad practice.
18 Comments
Like Comment
To view or add a comment, sign in
Ankit daksh
1mo Edited
Report this post
How Python + Ollama (LLMs like LLaMA) Work Together Python makes it super easy to use powerful AI models like Ollama. Here’s a simple breakdown 👇 1. Load the Model Using Ollama, you can run LLMs like LLaMA locally on your system. 2. Send Input (Prompt) With Python, you send a question or instruction to the model. 3. Processing The model understands your input using trained data (text, code, patterns). 4. Generate Output It returns a response — could be text, code, ideas, or answers. Why use Python + Ollama? ✔ Easy to integrate ✔ Run AI locally (no API cost) ✔ Fast prototyping ✔ Full control over your data Example Use Cases: • Chatbots • Code generation • Content writing • Automation tools
Like Comment
To view or add a comment, sign in
Chris H.
1w
Report this post
I heard a tip to use Rust instead of Python whenever you are coding with AI due to the speed and more importantly the validation. The code wont compile if there are errors. Unlike writing in the Python where you have to do the validation for AI and go back and forth with prompts to fix it. I'm finding it way faster to generate to code. Even though I dont know Rust that well it will be a great learning experience. Right now I'm using Claude Code but I might switch back to OpenCode's models again to see if that works. https://lnkd.in/gaDkaHXu

Rust: The Unlikely Engine Of The Vibe Coding Era social-www.forbes.com
Like Comment
To view or add a comment, sign in

85,114 followers

View Profile Connect

Pascal Biese’s Post

More from this author

The Gap of Judgement: The Missing Piece for Enterprise AI Transformation

Guided Autonomy: Progressive Trust Is All You Need

One Last Show: LLM Watch Says Goodbye

Explore content categories

Pascal Biese’s Post

More Relevant Posts

More from this author

The Gap of Judgement: The Missing Piece for Enterprise AI Transformation

Guided Autonomy: Progressive Trust Is All You Need

One Last Show: LLM Watch Says Goodbye

Explore related topics

Explore content categories