How I Optimized My Python E-commerce Scripts Using 7 ChatGPT Prompts
As someone analyzing E-commerce datasets (orders, customers, products, sales performance), I constantly deal with cleaning transactions, aggregating KPIs, and generating reports. Over time, my Python scripts became sluggish when handling thousands of rows — calculating revenue per category, filtering orders by conditions, and running customer retention metrics.
That’s when I turned to ChatGPT, crafting prompts that directly pinpointed bottlenecks in my scripts. Here’s how 7 prompts transformed my workflow.
Sample Dataset: E-commerce Orders
import pandas as pd
import numpy as np
# Simulated E-commerce dataset
np.random.seed(42)
categories = ["Electronics", "Clothing", "Books", "Groceries"]
customers = [f"CUST{i:03d}" for i in range(1, 21)]
orders_df = pd.DataFrame({
"OrderID": range(1, 51),
"CustomerID": np.random.choice(customers, 50),
"Category": np.random.choice(categories, 50),
"Quantity": np.random.randint(1, 5, 50),
"UnitPrice": np.round(np.random.uniform(5.0, 500.0, 50), 2),
"Delivered": np.random.choice([True, False], 50)
})
orders_df["Total"] = orders_df["Quantity"] * orders_df["UnitPrice"]
print(orders_df.head())
Prompt 1: Identify Bottlenecks in Chained Operations
Prompt I used: “ChatGPT, I have a pipeline that filters, groups, and computes KPIs. Can you identify which steps are slow and suggest an optimized version?”
Before (slow chaining):
# Average revenue per customer who got orders delivered
avg_revenue = orders_df[orders_df["Delivered"]]\
.groupby("CustomerID")["Total"].mean()\
.sort_values(ascending=False)
print(avg_revenue.head())
Optimized:
delivered = orders_df.loc[orders_df["Delivered"], ["CustomerID", "Total"]]
avg_revenue = delivered.groupby("CustomerID")["Total"].mean().sort_values(ascending=False)
print(avg_revenue.head())
👉 Filtering once, then grouping = fewer computations.
Prompt 2: Memory Efficiency
Prompt: “ChatGPT, how can I optimize memory usage for my sales dataset?”
orders_df["Category"] = orders_df["Category"].astype("category")
orders_df["Delivered"] = orders_df["Delivered"].astype("bool")
orders_df["Quantity"] = orders_df["Quantity"].astype("int8")
orders_df["UnitPrice"] = orders_df["UnitPrice"].astype("float32")
orders_df["Total"] = orders_df["Total"].astype("float32")
print(orders_df.info())
👉 Downcasting numeric values + categorical columns saved memory, allowing bigger datasets.
Prompt 3: Parallel Processing
Prompt: “Which transformations in my order dataset can I parallelize?”
from concurrent.futures import ThreadPoolExecutor
def discount_price(price):
return price * 0.9 # simulate 10% discount calculation
prices = orders_df["UnitPrice"].tolist()
with ThreadPoolExecutor() as executor:
discounted = list(executor.map(discount_price, prices))
print("First 5 discounted prices:", discounted[:5])
👉 Parallel processing helps when applying expensive functions on price lists or customer records.
Recommended by LinkedIn
Prompt 4: Efficient Grouping
Prompt: “How can I compute multiple KPIs (avg spend, max order, delivery rate) efficiently by category?”
category_summary = orders_df.groupby("Category").agg(
AvgSpend=("Total", "mean"),
MaxOrder=("Total", "max"),
DeliveryRate=("Delivered", "mean")
)
print(category_summary)
👉 Groupby .agg() is cleaner + faster than multiple loops.
Prompt 5: Vectorized Calculations
Prompt: “How do I compute revenue per item efficiently without loops?”
orders_df["RevenuePerItem"] = orders_df["Total"] / orders_df["Quantity"]
print(orders_df[["OrderID", "Category", "RevenuePerItem"]].head())
👉 Pandas vectorization beats row-by-row calculations.
Prompt 6: Efficient Filtering and Sorting
Prompt: “What’s the fastest way to filter high-value electronics orders and sort them?”
filtered_sorted = orders_df.query("Category == 'Electronics' and Total > 200")\
.sort_values("Total", ascending=False)
print(filtered_sorted.head())
👉 Using .query() is cleaner and faster than iterative conditions.
Prompt 7: NumPy for Distance Calculations
Prompt: “How can I efficiently compute pairwise distance between Quantity and Total?”
quantities = orders_df["Quantity"].values
totals = orders_df["Total"].values
coords = np.vstack([quantities, totals]).T
dist_matrix = np.sqrt(((coords[:, None, :] - coords[None, :, :]) ** 2).sum(axis=2))
print("Distance matrix shape:", dist_matrix.shape)
👉 NumPy vectorization for distances is way faster than nested Python loops.
Closing Thoughts
By using 7 ChatGPT prompts, I restructured my workflow from a mess of slow chained operations into a clean, optimized, and scalable pipeline.
The lessons weren’t just about speed — they were about thinking systematically: filtering once, vectorizing instead of looping, and leveraging memory + parallel processing smartly.
Now, my sales analysis scripts run faster, handle bigger datasets, and are easier to maintain.
Source: Inspired by Jaume Boguñá’s article How I Optimized My Python Scripts Using 7 ChatGPT Prompts
Using chatGPT wisely is a good idea 💡