📊 Day 20: Creating Cumulative Frequency Tables in Python
🔍 What is Cumulative Frequency?
Cumulative frequency is the running total of frequencies in a dataset. Instead of just showing how often a value appears, it adds up the previous frequencies, helping to analyze trends and percentile ranks easily.
🤔 Why is Cumulative Frequency Important?
✔ Helps in percentile analysis (e.g., "What percentage of students scored below 80?")
✔ Useful for identifying trends (e.g., "How many customers spend below $100?")
✔ Makes data visualization easier (often used in Ogive graphs)
📌 Where is it Used?
🔹 Education – Finding students below a certain grade level 🎓
🔹 Finance – Tracking cumulative revenue 💰
🔹 E-commerce – Analyzing spending patterns 🛒
🔹 Healthcare – Patient recovery progress 📈
📖 Scenario: Analyzing Customer Purchases in a Store 🏪
Imagine you run a supermarket, and you want to analyze how much customers typically spend in a single visit.
🔍 Question: "What percentage of customers spend below $100?"
Let's create a Cumulative Frequency Table for customer spending.
import pandas as pd
import numpy as np
# Sample data: Spending amounts of 20 customers
np.random.seed(42)
spending = np.random.randint(10, 200, 20) # Random spending amounts between $10 and $200
# Convert to DataFrame
df = pd.DataFrame(spending, columns=['Spending'])
# Create frequency table
df['Spending Range'] = pd.cut(df['Spending'], bins=[0, 50, 100, 150, 200], labels=["$0-$50", "$50-$100", "$100-$150", "$150-$200"])
freq_table = df['Spending Range'].value_counts().sort_index()
# Calculate cumulative frequency
cumulative_freq = freq_table.cumsum()
# Create cumulative frequency table
cumulative_table = pd.DataFrame({'Frequency': freq_table, 'Cumulative Frequency': cumulative_freq})
# Display table
print(cumulative_table)
💡 Insights:
✔ 7 out of 20 customers (35%) spent below $100.
✔ 80% of customers spent less than $150.
✔ This helps in offering better discounts & promotions for frequent spenders!
📈 Visualizing Cumulative Frequency (Ogive Plot)
To make it more insightful, let's plot a cumulative frequency graph.
import matplotlib.pyplot as plt
import seaborn as sns
# Plot cumulative frequency
plt.figure(figsize=(8,5))
sns.lineplot(x=cumulative_freq.index, y=cumulative_freq.values, marker='o', linestyle='-', color='b')
plt.xlabel('Spending Range')
plt.ylabel('Cumulative Frequency')
plt.title('Cumulative Frequency Distribution of Customer Spending')
plt.grid(True)
plt.show()
🎯 What Can We Learn from This?
💡 Fun Fact
Did You Know? Cumulative frequency is widely used in percentile calculations—it’s the reason your exam scores get ranked among other students! 🎓📊
🚀 Key Takeaways
✔ Cumulative frequency helps in trend analysis and decision-making.
✔ It is widely used in business, finance, education, and healthcare.
✔ Python makes it easy to generate and visualize cumulative frequency tables.
Happy learning!
Awesome!