How I Optimized My Python E-commerce Scripts Using 7 ChatGPT Prompts
How I Optimized My Python E-commerce Scripts Using 7 ChatGPT Prompts

How I Optimized My Python E-commerce Scripts Using 7 ChatGPT Prompts

As someone analyzing E-commerce datasets (orders, customers, products, sales performance), I constantly deal with cleaning transactions, aggregating KPIs, and generating reports. Over time, my Python scripts became sluggish when handling thousands of rows — calculating revenue per category, filtering orders by conditions, and running customer retention metrics.

That’s when I turned to ChatGPT, crafting prompts that directly pinpointed bottlenecks in my scripts. Here’s how 7 prompts transformed my workflow.


Sample Dataset: E-commerce Orders

import pandas as pd
import numpy as np

# Simulated E-commerce dataset
np.random.seed(42)
categories = ["Electronics", "Clothing", "Books", "Groceries"]
customers = [f"CUST{i:03d}" for i in range(1, 21)]

orders_df = pd.DataFrame({
    "OrderID": range(1, 51),
    "CustomerID": np.random.choice(customers, 50),
    "Category": np.random.choice(categories, 50),
    "Quantity": np.random.randint(1, 5, 50),
    "UnitPrice": np.round(np.random.uniform(5.0, 500.0, 50), 2),
    "Delivered": np.random.choice([True, False], 50)
})

orders_df["Total"] = orders_df["Quantity"] * orders_df["UnitPrice"]
print(orders_df.head())
        

Prompt 1: Identify Bottlenecks in Chained Operations

Prompt I used: “ChatGPT, I have a pipeline that filters, groups, and computes KPIs. Can you identify which steps are slow and suggest an optimized version?”

Before (slow chaining):

# Average revenue per customer who got orders delivered
avg_revenue = orders_df[orders_df["Delivered"]]\
    .groupby("CustomerID")["Total"].mean()\
    .sort_values(ascending=False)

print(avg_revenue.head())
        

Optimized:

delivered = orders_df.loc[orders_df["Delivered"], ["CustomerID", "Total"]]
avg_revenue = delivered.groupby("CustomerID")["Total"].mean().sort_values(ascending=False)
print(avg_revenue.head())
        

👉 Filtering once, then grouping = fewer computations.


Prompt 2: Memory Efficiency

Prompt: “ChatGPT, how can I optimize memory usage for my sales dataset?”

orders_df["Category"] = orders_df["Category"].astype("category")
orders_df["Delivered"] = orders_df["Delivered"].astype("bool")
orders_df["Quantity"] = orders_df["Quantity"].astype("int8")
orders_df["UnitPrice"] = orders_df["UnitPrice"].astype("float32")
orders_df["Total"] = orders_df["Total"].astype("float32")

print(orders_df.info())
        

👉 Downcasting numeric values + categorical columns saved memory, allowing bigger datasets.


Prompt 3: Parallel Processing

Prompt: “Which transformations in my order dataset can I parallelize?”

from concurrent.futures import ThreadPoolExecutor

def discount_price(price):
    return price * 0.9  # simulate 10% discount calculation

prices = orders_df["UnitPrice"].tolist()

with ThreadPoolExecutor() as executor:
    discounted = list(executor.map(discount_price, prices))

print("First 5 discounted prices:", discounted[:5])
        

👉 Parallel processing helps when applying expensive functions on price lists or customer records.


Prompt 4: Efficient Grouping

Prompt: “How can I compute multiple KPIs (avg spend, max order, delivery rate) efficiently by category?”

category_summary = orders_df.groupby("Category").agg(
    AvgSpend=("Total", "mean"),
    MaxOrder=("Total", "max"),
    DeliveryRate=("Delivered", "mean")
)
print(category_summary)
        

👉 Groupby .agg() is cleaner + faster than multiple loops.


Prompt 5: Vectorized Calculations

Prompt: “How do I compute revenue per item efficiently without loops?”

orders_df["RevenuePerItem"] = orders_df["Total"] / orders_df["Quantity"]
print(orders_df[["OrderID", "Category", "RevenuePerItem"]].head())
        

👉 Pandas vectorization beats row-by-row calculations.


Prompt 6: Efficient Filtering and Sorting

Prompt: “What’s the fastest way to filter high-value electronics orders and sort them?”

filtered_sorted = orders_df.query("Category == 'Electronics' and Total > 200")\
    .sort_values("Total", ascending=False)

print(filtered_sorted.head())
        

👉 Using .query() is cleaner and faster than iterative conditions.


Prompt 7: NumPy for Distance Calculations

Prompt: “How can I efficiently compute pairwise distance between Quantity and Total?”

quantities = orders_df["Quantity"].values
totals = orders_df["Total"].values
coords = np.vstack([quantities, totals]).T

dist_matrix = np.sqrt(((coords[:, None, :] - coords[None, :, :]) ** 2).sum(axis=2))
print("Distance matrix shape:", dist_matrix.shape)
        

👉 NumPy vectorization for distances is way faster than nested Python loops.


Closing Thoughts

By using 7 ChatGPT prompts, I restructured my workflow from a mess of slow chained operations into a clean, optimized, and scalable pipeline.

The lessons weren’t just about speed — they were about thinking systematically: filtering once, vectorizing instead of looping, and leveraging memory + parallel processing smartly.

Now, my sales analysis scripts run faster, handle bigger datasets, and are easier to maintain.


Source: Inspired by Jaume Boguñá’s article How I Optimized My Python Scripts Using 7 ChatGPT Prompts

To view or add a comment, sign in

More articles by Kiran Kumar V

Others also viewed

Explore content categories