Pandas GroupBy: Grouping Data by Categories and Applying Operations

🚀 Day 22 of My AI & Machine Learning Journey Today I learned about one of the most powerful concepts in Pandas — GroupBy. 💡 GroupBy is used to group data based on categories and then apply operations like sum, mean, count, etc. 🔹 What is GroupBy? It groups data based on a categorical column Example: movies.groupby('Genre') 👉 Creates groups like Action, Drama, Comedy 🔹 Basic Aggregations movies.groupby('Genre')['Gross'].sum() movies.groupby('Genre')['IMDB_Rating'].mean() movies.groupby('Genre')['No_of_Votes'].sum() 🔹 Real-World Examples • Top 3 genres by total earning movies.groupby('Genre')['Gross'].sum().sort_values(ascending=False).head(3) • Genre with highest average rating movies.groupby('Genre')'IMDB_Rating'].mean().sort_values(ascending=False).head(1) • Director with most popularity movies.groupby('Director')'No_of_Votes'].sum().sort_values(ascending=False).head(1) 🔹 Important GroupBy Methods • size() → number of rows in each group • first() → first item of group • last() → last item • nth(n) → specific row • get_group() → fetch specific group • describe() → statistical summary • sample() → random data from each group • nunique() → unique values count 🔹 Aggregation using agg() (Very Important 🔥) Apply different functions on different columns Example: movies.groupby('Genre').agg({ 'Runtime':'mean', 'IMDB_Rating':'mean', 'No_of_Votes':'sum', 'Gross':'sum' }) 💡 Biggest Takeaway: GroupBy helps in analyzing data category-wise, which is very useful in real-world problems. Learning deeper into data analysis 🚀 #MachineLearning #Python #Pandas #DataScience #GroupBy #DataAnalysis #LearningJourney

  • table

To view or add a comment, sign in

Explore content categories