LEGO Data Analysis with Pandas and Matplotlib

🧱 Day 74 of #100DaysOfCode — Building with LEGO Data & Pandas! Today's project was a blast — literally! 🚀 I dove into a rich LEGO dataset spanning from 1949 all the way to 2021, and put my pandas skills to work doing real exploratory data analysis. Here's what I built and discovered today: 🎨 Colors — Used .nunique() to find that LEGO produces 135 unique colors. Then broke it down into transparent vs. opaque with .value_counts() and boolean filtering. 📅 History — Traced LEGO's origins all the way back to 1949, just a few years after WWII ended, when they released just 5 sets across 2 themes. By 2019? 840 sets in a single year. That's a 30x increase. 📈 Complexity over time — Built a Matplotlib scatter plot showing average parts per set by year. The upward trend is undeniable — modern sets are dramatically more complex than those early brick sets from the late 40s and 50s. 🌟 Themes deep dive — Used .merge() to perform the pandas equivalent of a SQL inner join between the sets and themes DataFrames, then built a bar chart showing the top 10 themes by number of sets. Star Wars leads the pack with 750+ sets — the Force is strong with LEGO. 🌌 🛠️ Skills practiced today: Boolean filtering & .nunique() / .value_counts() .groupby() with .count() and .mean() DataFrame .merge() with left_on / right_on (foreign key joins!) Matplotlib line charts, scatter plots, and bar charts Dual-axis charts with .twinx() One thing that hit me today: data analysis isn't just about the code — it's about the story the data tells. The numbers behind LEGO's growth are actually a fascinating piece of business and cultural history hiding inside a CSV file. 26 days to go. Let's keep building. 🧱 #Python #Pandas #DataAnalysis #100DaysOfCode #DataScience #Matplotlib #LEGOData #LearningInPublic

To view or add a comment, sign in

More Relevant Posts

SULAGNA ROUTRAY
4w
Report this post
📈 Learning Matplotlib for Data Visualization? Here’s how I stopped treating it like “just plotting” and started actually understanding it. 🔹 1. Plotting Basics Everything starts with: plt.plot(x, y) 👉 You’re turning numbers into visual patterns. 🔹 2. Scatter Plots plt.scatter(x, y) 👉 This is where ML intuition builds — spotting relationships, trends, clusters. 🔹 3. Histograms plt.hist(data) 👉 Helps you understand distribution — something every ML model depends on. 🔹 4. Labels & Titles Always add: plt.xlabel() plt.ylabel() plt.title() 👉 If your plot isn’t readable, it’s useless. 🔹 5. Subplots plt.subplot() 👉 Compare multiple graphs side by side — critical for analysis. 🔹 6. Customization Colors, markers, styles — not just aesthetics, but clarity. 💡 What clicked for me: Matplotlib isn’t just about plotting graphs. It’s about seeing your data before modeling it. #DataScience #Python #Matplotlib #MachineLearning #DataVisualization
Like Comment
To view or add a comment, sign in
Hammad Farooq
2w
Report this post
🚀 Businesses are drowning in data but struggling to make decisions. I recently worked on a Data Analysis project using Python, Excel, and Power BI where I transformed raw data into clear, actionable insights. 📊 Built interactive dashboards 📈 Identified key trends & patterns ⚡ Turned complex data into simple business decisions 🎥 Here’s a quick demo of the dashboard in action. If you're looking to turn your data into powerful insights, let’s connect. 🔗 Portfolio: https://lnkd.in/d_tbGgTM #DataAnalytics #PowerBI #Python #Excel #BusinessIntelligence #Dashboard #DataScience #AI #MachineLearning

2 Comments
Like Comment
To view or add a comment, sign in
Sudarshan Pimparwar
3w
Report this post
🚀 Day 75 - Customize plots in Matplotlib Today, I explored how to customize plots in Matplotlib 🎨📊 — taking visualizations from basic to professional level! 🔍 What I learned today: ✨ Customizing Plots Understanding how to control Figure and Axes properties to improve clarity and presentation. 📍 Key Concepts Covered: • 🔹 Markers – Highlighting data points for better visibility • 🔹 Adding Labels – Making plots more informative (titles, axis labels) • 🔹 Configuring Grid – Improving readability with structured grids • 🔹 Creating Subplots – Displaying multiple visualizations in one figure • 🔹 Styling Plots – Enhancing aesthetics with colors and themes • 🔹 Resizing Plots – Adjusting figure size for better layout and presentation • 🔹 Transparency (Alpha) – Controlling opacity to manage overlapping visuals 💡 Key Takeaway: A good visualization is not just about data — it's about how effectively you present it. Customization helps in telling a clearer and more impactful story with data. 📈 Slowly moving from just plotting graphs to designing meaningful visual insights! #Day75 #DataScienceJourney #Matplotlib #DataVisualization #Python #LearningInPublic #Analytics #Visualization
Like Comment
To view or add a comment, sign in
ActuaryWhoCodes

186 followers
1w
Report this post
🧠 Pandas vs Excel — Side-by-Side Comparison for Actuaries Most actuarial work starts in Excel. But as data grows, the way we handle it needs to evolve. Here’s the same task — combining files and summarising claims — done both ways. 🔹 In Excel 👉 open multiple files 👉 copy-paste into a master sheet 👉 clean column names manually 👉 build pivot tables 👉 refresh and reformat each time Works well for small datasets. Becomes slow and error-prone at scale. 🔹 In Python (Pandas) <\> import pandas as pd, glob files = glob.glob('data/*.xlsx') df = pd.concat([pd.read_excel(f) for f in files], ignore_index=True) df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_') summary = df.groupby('product_line')['claim_amount'].agg(['sum','mean','count']) <\> ✅ Excel is excellent for exploration. ✅ Pandas is better for repeatable, scalable processes. ✅ The shift isn’t about replacing Excel — it’s about using the right tool as complexity grows. 👉 Where do you currently rely more — Excel or Python? #ActuaryWhoCodes #PythonForActuaries #Pandas #Excel #Automation #DataAnalytics
2 Comments
Like Comment
To view or add a comment, sign in
Yogesh Prajapati
4w Edited
Report this post
It started with a simple question: “Can raw data actually tell a business story?” Excited to share my first Data Analytics project on dataset with 113,000+ rows… and started exploring. At first, it was just numbers — rows, columns, and spreadsheets. But as I dug deeper using Python (Pandas, NumPy) and built visualizations with Matplotlib & Seaborn, patterns began to emerge… I discovered that: The United States wasn’t just another market — it was driving the majority of revenue The 35–64 age group turned out to be the most valuable customer segment Accessories were most in demand Some transactions were actually loss-making 📉, revealing hidden inefficiencies That’s when it clicked for me 👇 Data isn’t just analysis. It’s decision-making. This project taught me how to move from: ➡️ “What is happening?” ➡️ to “Why is it happening?” ➡️ to “What should be done next?” And that shift changed how I look at data completely. I’ve shared some of my visualizations in this post — would genuinely love your feedback!! GitHub link -- https://lnkd.in/ghY2au8p #DataAnalytics #Python #EDA #DataScience #LearningJourney #Projects #Analytics #StorytellingWithData

1 Comment
Like Comment
To view or add a comment, sign in
Anuj Saini
1w
Report this post
Your charts look like 2010. Default Matplotlib blue bars, no titles, axis labels cut off. I've seen it in 80% of data analyst portfolios. And here's the thing: your code can be perfect, your analysis can be brilliant — but if your visualizations look amateur, you lose the room. Free notebook that fixes all of it: → Every core chart type (line, bar, scatter, histogram, box plot, heatmap) with when to use each → Subplots — the layout grammar most people never learn → Annotations, arrows, text — how to highlight the ONE thing your chart is saying → Colormaps — why "viridis" beats "rainbow" (and why "coolwarm" for diverging data) → Styling: titles, labels, ticks, grids, legends → Saving publication-ready figures (DPI, bbox_inches, formats) Before and after comparisons in every section. Your next chart won't embarrass you. https://lnkd.in/gn9cfdr8 Day 4/7. #DataVisualization #Matplotlib #Python #DataAnalyst #DataScience #Charts #DataStorytelling #FreeResources
Like Comment
To view or add a comment, sign in
Rishi GABA
2w
Report this post
🚀 Unlocking the Power of Data Visualization with Matplotlib & Seaborn Most data is ignored… because it’s not presented well. Over the past few weeks, I’ve been exploring how to turn raw data into meaningful insights using Python — working extensively with Matplotlib and Seaborn. Here’s what I built 👇 📈 Line Plots — to track trends over time 📊 Styled Charts — adding labels, legends & grids for clarity 📦 Bar Charts — comparing categories effectively 🥧 Pie Charts — understanding proportions at a glance 📉 Histograms — exploring data distribution 🔍 Scatter Plots — identifying relationships 🎯 Seaborn Visuals — adding depth with categories & styles 🔥 Heatmaps — uncovering correlations in data 💡 What I learned: ✔ Visualization is not just plotting — it’s storytelling ✔ Small styling tweaks can completely change insights ✔ Combining Matplotlib + Seaborn is incredibly powerful 📂 I’ve attached a file containing: ▪️ All the code snippets I used ▪️ Multiple variations of each visualization ▪️ Ready-to-run examples for practice 👉 If you're learning Data Science or working on projects, this might be useful for you! 💬 Which visualization do you use the most in your workflow? Let’s discuss 👇 #DataScience #Python #DataVisualization #Matplotlib #Seaborn #Analytics #MachineLearning #LearnInPublic
Like Comment
To view or add a comment, sign in
Positron

671 followers
4w
Report this post
If your data science workflow involves an IDE, a separate notebook editor, data visualization tool, and a PDF reader on the side — then that's not a workflow, that's an Easter egg hunt. Positron is a new free IDE from Posit, built specifically for people who work with data. It has: 🐍 Native Python + R support — together at last (like peeps and existential dread) 📊 Live Variable Explorer — watch your data come to life in real time 🔍 Data Explorer — filter and inspect any dataframe without writing a line of code 📓 Jupyter Notebook editor with data science panes built right in 🤖 AI assistant that shows its work — transparent, inspectable, reproducible 🚀 Live app previews for Shiny, Streamlit, and Dash 📄 Built-in PDF viewer so you never have to leave your workflow It's not trying to be everything to everyone (like a jelly bean with every flavor). It's the one place where data science work actually fits. What's on your data science wishlist this spring? Drop it below. We're listening. Welcome to our page! 👋 We're happy to have you.
4 Comments
Like Comment
To view or add a comment, sign in
Sanjai S
1w
Report this post
Excel is amazing. But when your dataset hits 1 million rows and your laptop sounds like it’s preparing for takeoff? It’s time to upgrade. 🛫 For years in transactional analysis, I thought mastering data meant mastering complex spreadsheet formulas. Then I started using Python’s Pandas library, and it completely changed how I work. Think of Pandas as a spreadsheet on steroids. It replaces manual clicking and scrolling with a reproducible, programmatic pipeline. Here is the simple translation guide from Spreadsheets to Pandas 👇 🔹 VLOOKUP? Just use .merge(). You can join multiple tables in one line of code. 🔹 Pivot Tables? That’s .groupby(). Instantly aggregate your data by any category. 🔹 Hunting for blank cells? .isnull().sum() tells you exactly what's missing in seconds. 🔹 Deleting messy data? .dropna() cleans it up instantly. It’s not just about handling larger datasets without crashing. It’s about building a repeatable process. You write the cleaning script once, and the next time you get a messy dataset, your pipeline does the work for you. If you are transitioning into a data role, don't let the code intimidate you. Pandas isn't changing what you do with data. It’s just giving you a faster, stronger engine to do it. 🏎️ ♻️ Repost if you remember your first time using Pandas! 💬 What is your most-used Pandas function? Let me know below 👇 #DataAnalytics #Python #Pandas #DataScience #DataAnalyst #LearningInPublic
1 Comment
Like Comment
To view or add a comment, sign in
shafayet hossain
2w
Report this post
Day 4: Data Visualization — Turning Data into Insights Raw data alone doesn’t tell a story. Visualization is what makes it understandable. Why visualization matters? Humans understand visuals faster than numbers. A simple chart can reveal patterns that raw data cannot. Common types of plots: * Line chart → trends over time * Bar chart → comparison between categories * Histogram → data distribution * Scatter plot → relationships between variables Simple example (Matplotlib): import matplotlib.pyplot as plt data = [10, 20, 30, 40] plt.plot(data) plt.show() With just a few lines of code, you can turn numbers into meaningful insights. Where visualization is used: * Business reports * Data analysis * Machine learning insights * Decision making Key insight: Good analysis is not just about finding insights — it’s about presenting them clearly. #DataScience #DataVisualization #Python #Matplotlib #Analytics
Like Comment
To view or add a comment, sign in

1,127 followers

170 Posts

View Profile Connect

LEGO Data Analysis with Pandas and Matplotlib

More Relevant Posts

Explore content categories