Data Analysis Day 44: Optimizing Sports Dataset with Python

2mo

𝐒𝐩𝐨𝐫𝐭𝐬 𝐃𝐚𝐭𝐚 𝐀𝐧𝐚𝐥𝐲𝐬𝐢𝐬 𝐃𝐚𝐲 44: 50 𝐃𝐚𝐲𝐬 𝐨𝐟 𝐃𝐚𝐭𝐚 𝐀𝐧𝐚𝐥𝐲𝐬𝐢𝐬 𝐰𝐢𝐭𝐡 𝐏𝐲𝐭𝐡𝐨𝐧 Today’s analysis focused on inspecting a sports dataset, evaluating and optimizing memory usage by converting object columns to categorical types, renaming and querying specific fields, and exporting a cleaned subset for further use—highlighting how data type management directly impacts performance and efficiency. 𝐎𝐬𝐭𝐢𝐧𝐚𝐭𝐨 𝐑𝐢𝐠𝐨𝐫𝐞 #Python #NumPy #DataAnalysis #DataScience #MachineLearning #ArtificialIntelligence #DataAnalytics #LearnInPublic #GitHub #Data #TechCommunity #DailyPractice #Consistency #DataDriven #50_days_of_data_analysis_with_python #SQL #Learning #ostinatorigore

1 Comment

J. Nault 2mo

I'm using similar functionality in my university class. We're focusing on data engineering.LI tends to have a bunch of culture enforcement (when do humans not have that?). So it's nice to see just a regular post about what you're learning.

To view or add a comment, sign in

More Relevant Posts

Ejaz ud Din
1mo
Report this post
Messy column names are a common problem when working with real datasets. Extra spaces, inconsistent capitalization, and formatting issues can easily break your workflow. Instead of fixing them manually, you can clean them in one line using Pandas. df.columns = df.columns.str.strip().str.lower().str.replace(" ", "_") This line will: • Remove extra spaces • Convert column names to lowercase • Replace spaces with underscores Example: "User Name" → user_name " Total Sales " → total_sales Small improvements like this make your data pipelines cleaner and easier to maintain. #Python #DataScience #MachineLearning #DataAnalytics
Like Comment
To view or add a comment, sign in
Meshva Patel
1mo
Report this post
𝗧𝗵𝗲 𝟭𝟬/𝟵𝟬 𝗥𝘂𝗹𝗲 Spending the first 𝟭𝟬% of time planning can save 𝟵𝟬% of the effort later. I notice this every time I work with 𝗱𝗮𝘁𝗮. Before writing 𝗣𝘆𝘁𝗵𝗼𝗻 code, building 𝗦𝗤𝗟 queries, or creating 𝗣𝗼𝘄𝗲𝗿 𝗕𝗜 dashboards, the real work is asking the 𝗿𝗶𝗴𝗵𝘁 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀 and understanding the 𝗱𝗮𝘁𝗮. 𝗚𝗼𝗼𝗱 𝗮𝗻𝗮𝗹𝘆𝘀𝗶𝘀 doesn’t start with tools. It starts with 𝗧𝗵𝗶𝗻𝗸𝗶𝗻𝗴. #DataAnalytics #DataScience #Python #SQL #PowerBI #ProblemSolving
4 Comments
Like Comment
To view or add a comment, sign in
Shruti Das
1mo
Report this post
📊 One simple chart helped me understand something interesting in Data Science today. While doing Exploratory Data Analysis (EDA) on the Tips dataset, I noticed something clear. 💡 When the total bill increases, the tip usually increases too. I visualized it using a scatter plot, and the relationship became obvious. That’s the power of data visualization — it turns raw numbers into patterns we can easily understand. Sometimes a simple chart explains more than a table full of numbers. 🤔 What visualization do you use the most during EDA? #DataScience #EDA #Python #DataVisualization #LearningInPublic
Like Comment
To view or add a comment, sign in
Bhavadhaarani B
1mo
Report this post
📊 Day 19 — 60 Days Data Analytics Challenge Today I learned about Crosstab in Pandas, which helps summarize data by showing the relationship between two categorical variables. 🔍 What I practiced today: • Creating cross-tabulations using pd.crosstab() • Understanding category-wise data distribution • Using margins=True to include total values • Improving table readability with row and column labels This feature is very helpful during Exploratory Data Analysis (EDA) because it allows us to quickly compare categories and identify patterns in the dataset. #DataAnalytics #Python #Pandas #60DaysChallenge #LearningJourney
Like Comment
To view or add a comment, sign in
Bhavadhaarani B
2mo
Report this post
📊 Day 9 — 60 Days Data Analytics Challenge | Outlier Detection & Data Distribution Today I explored how data analysts identify outliers and understand data distribution using visualization techniques. 🔎 What I Practiced: • Visualizing distribution with histograms • Detecting outliers using boxplots • Comparing mean vs median to analyze data behavior • Understanding the impact of extreme values on analysis 📈 This practice helped me see how important it is to validate data before drawing conclusions. 💡 Key Learning: Accurate insights begin with understanding data distribution. #60DaysDataAnalyticsChallenge #EDA #DataAnalytics #Python #LearningInPublic
Like Comment
To view or add a comment, sign in
vani sri
2mo
Report this post
✈️ Leveling up my Data Analysis skills with Python! I’ve spent some quality time recently deepening my knowledge of data manipulation and visualization. It’s amazing how a few lines of code can turn raw data into meaningful insights. During this session, I focused on: ✅ Data Manipulation with Pandas: Loading datasets using pd.read_csv() ,performing head/tail inspections, and dynamically updating DataFrames with new records and columns. ✅ Data Visualization with Matplotlib: 🫨 Line Plots: mastered the basics of plt.plot(), including how to define x and y axes. 🫨 Styling & Customization: explored adding visual markers (marker='0') and changing line colors to make the data more readable. 🫨 Plot Labeling: learned the importance of clear communication by adding a title, x-axis label, and y-axis label to your charts. 🫨 Categorical Charts: moved beyond line plots to create bar charts using plt.bar() to compare discrete categories. A huge thank you to my mentor Praveen Kalimuthu and Tech Data Community for the guidance and for making these complex concepts so much easier to grasp! Looking forward to applying these tools to even bigger datasets. #DataAnalysis #Python #Pandas #Matplotlib #DataVisualization #LearningJourney #DataScience #Coding
Like Comment
To view or add a comment, sign in
KAVIRAJ T.U
1mo
Report this post
📢💡 𝐃𝐚𝐲 𝟗 – 𝐫𝐞𝐩𝐚𝐫𝐭𝐢𝐭𝐢𝐨𝐧() 𝐯𝐬 𝐜𝐨𝐚𝐥𝐞𝐬𝐜𝐞() 🤔 𝐒𝐜𝐞𝐧𝐚𝐫𝐢𝐨 ✔️ Demonstrate partition control impact on small dataset (interview favorite). 📍 𝐈𝐧𝐩𝐮𝐭 𝐝𝐚𝐭𝐚 : 𝐩𝐲𝐭𝐡𝐨𝐧 data = list(range(1, 21)) # 20 numbers 📤 𝐄𝐱𝐩𝐞𝐜𝐭𝐞𝐝 𝐨𝐮𝐭𝐩𝐮𝐭 👉 Original partitions: 4 👉 After repartition(2): 2 👉 After coalesce(8): 8 (no shuffle) 🧠 𝐄𝐱𝐩𝐥𝐚𝐧𝐚𝐭𝐢𝐨𝐧 ✔️ 𝐫𝐞𝐩𝐚𝐫𝐭𝐢𝐭𝐢𝐨𝐧(n) = shuffle + resize (even distribution) ✔️ 𝐜𝐨𝐚𝐥𝐞𝐬𝐜𝐞(n) = merge partitions (reduce only, no shuffle) ✔️ 𝐈𝐧𝐭𝐞𝐫𝐯𝐢𝐞𝐰: Use coalesce for reduce, repartition for increase. #python #Spark #Pyspark #Dataengineering #Bigdata #learnmore #pythonmcq #programmingwithpython #mcq #spark #Practicewithme
2 Comments
Like Comment
To view or add a comment, sign in
Kenan Tufan K.
2mo
Report this post
Starting my journey into Pandas for data analysis. In my first lesson, I worked hands-on with a real dataset and explored: • Reading CSV files with Pandas • Understanding the DataFrame structure • Exploring columns and inspecting data • Getting familiar with a real-world survey dataset I documented the process and shared both the code and detailed notes: 📓 Notebook (code): https://lnkd.in/eZzEX394 📝 Notes (explanations): https://lnkd.in/efvh2ApQ I’ll continue this series and share each step as I progress. #Python #Pandas #DataAnalytics #DataScienceJourney #LearningInPublic
Like Comment
To view or add a comment, sign in
Suman Saha
1mo
Report this post
𝗣𝘆𝘁𝗵𝗼𝗻 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗣𝗮𝘁𝘁𝗲𝗿𝗻𝘀 🐍 | 𝗡𝘂𝗺𝗣𝘆 – 𝗖𝗼𝗻𝗰𝗮𝘁𝗲𝗻𝗮𝘁𝗲 🔗 | 📅 𝗗𝗮𝘆 𝟱𝟲 🚀 Today’s task: ✅ 𝗧𝗮𝗸𝗲 𝟮 𝗺𝗮𝘁𝗿𝗶𝗰𝗲𝘀. ✅ 𝗖𝗼𝗻𝘃𝗲𝗿𝘁 𝘁𝗵𝗲𝗺 𝗶𝗻𝘁𝗼 NumPy arrays. ✅ 𝗝𝗼𝗶𝗻 𝘁𝗵𝗲𝗺 𝗶𝗻𝘁𝗼 𝗮 𝘀𝗶𝗻𝗴𝗹𝗲 𝗺𝗮𝘁𝗿𝗶𝘅. Only if you understand array concatenation. Core idea from the code: 𝙣𝙥.𝙘𝙤𝙣𝙘𝙖𝙩𝙚𝙣𝙖𝙩𝙚((𝙖𝙧𝙧1, 𝙖𝙧𝙧2), 𝙖𝙭𝙞𝙨=0) This joins arrays row-wise. Meaning: Matrix A (N × P) Matrix B (M × P) After concatenation → Result (N+M × P) Example concept: A 1 2 3 4 5 6 B 7 8 9 Result 1 2 3 4 5 6 7 8 9 💡 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗧𝗮𝗸𝗲𝗮𝘄𝗮𝘆: Concatenate = merge arrays along an axis Strong candidates understand: • Array dimensions • Axis operations (rows vs columns) • How NumPy handles structured data Because in data processing, combining datasets is a common task. Better structure. Better analysis. #Python #NumPy #InterviewPrep #HackerRank #DataAnalytics #DataStructures #DailyCoding #Consistency
Like Comment
To view or add a comment, sign in
Yubisono P.

Experienced Credit Specialist with a demonstrated history of working in the Financial Services Industry. Data Scientist and Machine Learnings using Python, SQL, PostgreSQL, Tableau, Pentaho, Chat GPT, Gemini 2.5 Flash
1mo
Report this post
Data science using geoplotlib #machinelearning #datascience #datavisualization #pythonlibrary #geoplotlib Geoplotlib is a Python library specifically designed for visualizing geographical data. It allows users to create a wide range of visualizations such as density maps, choropleths, symbol maps, and interactive visualizations that provide insights into spatial datasets. Built on Pyglet, it offers a simple API for plotting data on maps, enabling analysts and developers to explore geospatial patterns effectively. https://lnkd.in/g-KJJQn5

GitHub - andrea-cuttone/geoplotlib: python toolbox for visualizing geographical data and making maps github.com
Like Comment
To view or add a comment, sign in

972 followers

139 Posts

View Profile Follow

Data Analysis Day 44: Optimizing Sports Dataset with Python

More Relevant Posts

Explore related topics

Explore content categories