LEGO Data Analysis with Pandas and Matplotlib

🧱 Day 74 of #100DaysOfCode — Building with LEGO Data & Pandas! Today's project was a blast — literally! 🚀 I dove into a rich LEGO dataset spanning from 1949 all the way to 2021, and put my pandas skills to work doing real exploratory data analysis. Here's what I built and discovered today: 🎨 Colors — Used .nunique() to find that LEGO produces 135 unique colors. Then broke it down into transparent vs. opaque with .value_counts() and boolean filtering. 📅 History — Traced LEGO's origins all the way back to 1949, just a few years after WWII ended, when they released just 5 sets across 2 themes. By 2019? 840 sets in a single year. That's a 30x increase. 📈 Complexity over time — Built a Matplotlib scatter plot showing average parts per set by year. The upward trend is undeniable — modern sets are dramatically more complex than those early brick sets from the late 40s and 50s. 🌟 Themes deep dive — Used .merge() to perform the pandas equivalent of a SQL inner join between the sets and themes DataFrames, then built a bar chart showing the top 10 themes by number of sets. Star Wars leads the pack with 750+ sets — the Force is strong with LEGO. 🌌 🛠️ Skills practiced today: Boolean filtering & .nunique() / .value_counts() .groupby() with .count() and .mean() DataFrame .merge() with left_on / right_on (foreign key joins!) Matplotlib line charts, scatter plots, and bar charts Dual-axis charts with .twinx() One thing that hit me today: data analysis isn't just about the code — it's about the story the data tells. The numbers behind LEGO's growth are actually a fascinating piece of business and cultural history hiding inside a CSV file. 26 days to go. Let's keep building. 🧱 #Python #Pandas #DataAnalysis #100DaysOfCode #DataScience #Matplotlib #LEGOData #LearningInPublic

  • chart, line chart

To view or add a comment, sign in

Explore content categories