LLM Batch Processing with Code Interpreter

David Farago

Published Aug 17, 2023

As my previous articles already demonstrate, Code Interpreter is capable of much more than all other AI models I know of: Its awesome LLM (let's call it GPT4.5) in combination with its code interpretation (let's call it CI) via file uploading, storage, and sandboxed code execution, make it an LLM agent that can iteratively (1) generate GPT4.5 output, including code (2) make use of CI to store data and create output from code execution, and (3) reflect on the result to gain information for the next iteration, to perform the current subtask better or use the information to move on to the next subtask.

However, GPT4.5 has a token limit (i.e. context length) of 8192 tokens, limiting the number of iterations the LLM agent can perform in one go. Of course GPT4.5's token limit is only relevant for GPT4.5's output, so as long as GPT4.5 generates code that in turn iterates over the upload and stored data without printing too much output, CI can do many iterations without reaching GPT4.5's token limit. But sometimes you want to iterate over your data and make use of the LLM in each iteration. This kind of LLM batch processing is very useful for e.g. evaluations, such as HumanEval on Code Interpreter itself, or data generation, such as creating a medical tiny story for each ICD-10 code. In this case, GPT4.5 (as opposed to the code that CI executes) needs to iterate over the data.

For the HumanEval benchmark evaluation in my previous post, I could easily fit 5 iterations in one go, i.e. within one GPT4.5 output. You can further increase the number of iterations that GPT4.5 can do in one go by reducing GPT4.5's output, e.g. by using "notalk;justgo". But in my experiments, GPT4.5 thinks less step by step and yields much worse results if you limit its verbosity. Thus, you get the best solution if you make GPT4.5 do a single iteration per output, and automate the chat.

Lacking access to https://www.multion.ai, I came up with the following JavaScript code to execute in the browser to automate the chat -- with the help of browser developer tools and, of course, Code Interpreter:

Recommended by LinkedIn

Prompt Engineering vs RAGs vs Fine-tuning: How to…

Dharmendra Rajen 1 year ago

RAG vs Agentic RAG vs AI Agent RAG: A Deep…

Sachin P 10 months ago

What if, Clean Code is Poisoning your AI?

A.J. Geddes 5 months ago

If you set TIME_INTERVAL high enough, you can avoid creating too many messages, which would exceed your usage cap and lead to the following error:

You've reached the current usage cap for GPT-4. You can continue with the default model now, or try again after 10:59 PM. Learn more

But after you have idled for about 30 minutes, CI drops all its storage. In my experiments for a relatively hard task that takes about 2 minutes per iteration (after some more minutes, CI would time out), I set TIME_INTERVAL to around 1 minute and don't run into the usage cap (which is supposed to be 50 messages every 3 hours).

Even if you have set TIME_INTERVAL high enough, it is advisable to sporadically tell GPT4.5 to offer its data as download, and to download the backup. If multiple files are involved, you can tell GPT4.5 to create a zip file. If you happen to exceed your usage cap in spite of a high TIME_INTERVAL, you can simply re-upload the data you backed up and resume with the first iteration not included in your backup, e.g.:

I lost the last couple of outputs, please continue from line 95. I uploaded `backup5.zip` with the relevant data (the original upload and derived data up to line 94).

To view or add a comment, sign in

LLM Batch Processing with Code Interpreter

David Farago

Recommended by LinkedIn

More articles by David Farago

Others also viewed

Fine-Tuning vs. Prompt Engineering: What's the Best Way to Customize LLMs?

#5 What Makes GPT-5 So Smart

LLMOps: Comprehensive Guide with Key Concepts Explained in Depth

When AI Gets It Wrong: The Hidden GPT-5.1 Failure Mode Every Engineer Should Know

My First Custom Document-RAG GPT

Unlocking Agentic AI: How Model Context Protocols (MCPs) make Multi-Tool LLM Agents Scalable (Plus a No-Code n8n Tutorial)

How to Build a Perplexity-Like Agent? A Developer's Guide to Next-Gen AI Assistants

The MCP Request Flow: What Actually Happens When an AI Agent Calls a Tool

Prompt Engineering Explained: What a Prompt Is, Why It Matters, and How It Works

How Llms Process Language

Reducing AI Hallucinations with Code-First LLM Methods

Scaling Large Language Models from GPT-1 to GPT-3

Improving LLM Coding Accuracy with Code Intelligence

Explore content categories

Recommended by LinkedIn

More articles by David Farago

Your n8n agent's quality

Vise Coding

Code Interpreter on HumanEval: about 95% to 99% pass@1 and 100% pass@1 upon self-critique

Turn Code-Interpreter into a Reliable Coding Agent

Others also viewed

Fine-Tuning vs. Prompt Engineering: What's the Best Way to Customize LLMs?

#5 What Makes GPT-5 So Smart

LLMOps: Comprehensive Guide with Key Concepts Explained in Depth

When AI Gets It Wrong: The Hidden GPT-5.1 Failure Mode Every Engineer Should Know

My First Custom Document-RAG GPT

Unlocking Agentic AI: How Model Context Protocols (MCPs) make Multi-Tool LLM Agents Scalable (Plus a No-Code n8n Tutorial)

How to Build a Perplexity-Like Agent? A Developer's Guide to Next-Gen AI Assistants

The MCP Request Flow: What Actually Happens When an AI Agent Calls a Tool

Prompt Engineering Explained: What a Prompt Is, Why It Matters, and How It Works

Similar topics

How Llms Process Language

Reducing AI Hallucinations with Code-First LLM Methods

Scaling Large Language Models from GPT-1 to GPT-3

Improving LLM Coding Accuracy with Code Intelligence

Explore content categories