Data Analysis with Python: Runners and Income Dataset

2mo

𝐑𝐮𝐧𝐧𝐞𝐫𝐬 𝐀𝐧𝐝 𝐈𝐧𝐜𝐨𝐦𝐞 𝐃𝐚𝐭𝐚 𝐀𝐧𝐚𝐥𝐲𝐬𝐢𝐬 𝐃𝐚𝐲 35: 50 𝐃𝐚𝐲𝐬 𝐨𝐟 𝐃𝐚𝐭𝐚 𝐀𝐧𝐚𝐥𝐲𝐬𝐢𝐬 𝐰𝐢𝐭𝐡 𝐏𝐲𝐭𝐡𝐨𝐧 Today’s work focused on cleaning and analyzing a combined runners and income dataset using pandas and NumPy. ✔️ Inspected dataset structure, shape, and missing values ✔️ Handled NaNs by dropping empty rows and imputing remaining values ✔️ Used describe() to summarize data and extract key statistics ✔️ Calculated total miles run using NumPy operations ✔️ Filtered individuals based on income thresholds ✔️ Created and exported a clean subset of the data for reuse This session reinforced the importance of data inspection, basic preprocessing, and targeted filtering before moving into deeper analysis. 𝐎𝐬𝐭𝐢𝐧𝐚𝐭𝐨 𝐑𝐢𝐠𝐨𝐫𝐞 #Python #NumPy #DataAnalysis #DataScience #MachineLearning #ArtificialIntelligence #DataAnalytics #LearnInPublic #GitHub #Data #TechCommunity #DailyPractice #Consistency #DataDriven #50_days_of_data_analysis_with_python #SQL #Learning #ostinatorigore

To view or add a comment, sign in

More Relevant Posts

Dream Big Technologies Sdn. Bhd

413 followers
1mo
Report this post
𝗧𝗼𝗽 𝟭𝟬 𝗣𝗮𝗻𝗱𝗮𝘀 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝘀 𝗘𝘃𝗲𝗿𝘆 𝗗𝗮𝘁𝗮 𝗔𝗻𝗮𝗹𝘆𝘀𝘁 𝗦𝗵𝗼𝘂𝗹𝗱 𝗞𝗻𝗼𝘄 If you're working with Python for data analysis, mastering a few core Pandas functions can dramatically improve your productivity. Here are 10 essential functions used in most real-world data projects: • pd.read_csv() – Load datasets quickly • df.head() – Preview the first rows • df.info() – Understand structure & data types • df.describe() – Generate summary statistics • df.sort_values() – Sort data efficiently • df.groupby() – Aggregate and analyze groups • df.pivot_table() – Create powerful data summaries • pd.concat() – Combine multiple datasets • df.isnull() / df.fillna() – Handle missing data • df.apply() – Apply custom logic to your data These functions form the foundation of practical data analysis with Python. Which Pandas function do you use the most in your workflow? #Python #DataScience #Pandas #DBT #DreamBigTechnologies #AI #LearnPython
Like Comment
To view or add a comment, sign in
SURIYA D
2mo
Report this post
🚀 Day 2 | 15-Day Pandas Challenge 📊 Find the Shape of a DataFrame (Rows & Columns) Understanding the structure of your dataset is the first step in data analysis. In this challenge, we are given a DataFrame called players: Column Name Type player_id int name object age int position object 🎯 Task: Write a solution to calculate and return: [number of rows, number of columns] 💡 Why This Matters: Knowing the number of rows and columns helps you: Understand dataset size Validate data loading Prepare for data cleaning & transformation Analyze data efficiently 🧠 Hint: In Pandas, the .shape attribute gives you both values instantly! 🔥 Key Skills: Python | Pandas | DataFrame Shape | Data Exploration | Data Analysis #Python #Pandas #DataScience #MachineLearning #DataAnalysis #CodingChallenge #LearnToCode #ProgrammersLife #TechCommunity #Developer #AI #Analytics #DataEngineer #100DaysOfCode #CareerGrowth #Upskill #LinkedInLearning #15DaysOfPandas
Like Comment
To view or add a comment, sign in
Yogesh Gaur
2mo
Report this post
One thing I’ve realized while working with data: SQL and Pandas are not competitors. They’re partners. When I first learned SQL, I focused on writing queries that worked. Later, when I started using Python Pandas, I had a small realization… The logic is the same. Filtering rows. Grouping data. Joining tables. Aggregating results. The syntax changes — the thinking doesn’t. That’s when it clicked for me: Strong data professionals don’t just memorize commands. They understand concepts. If you truly understand how data is structured, filtered, grouped, and joined — switching between SQL and Pandas becomes much easier. Tools evolve. Concepts stay. #SQL #Python #Pandas #DataAnalytics #DataScience #DataEngineering #TechCareers
Like Comment
To view or add a comment, sign in
Harshal Supe
2mo
Report this post
Today, I learned how NumPy arrays are faster and more efficient than regular Python lists, and how they help in performing mathematical and statistical operations easily. Understanding concepts like arrays, indexing, slicing, and basic operations is strengthening my foundation in data analysis. Small steps every day toward becoming a better Data Analyst! 🚀📊 #NumPy #PythonForDataAnalytics
Like Comment
To view or add a comment, sign in
Piyush Kamra
2mo
Report this post
🚀 Recently wrapped up NumPy and Pandas! 📊In this screenshot, I loaded a CSV with country, religion, city, capital, growth data, etc. Used NumPy for mean and stats calculations, then Pandas to filter data – queried specific countries/regions/months, cleaned it, and extracted insights like top growth cities.These libraries are building a strong foundation for data analytics. Next up: applying them to real projects! What's your favorite Pandas function? 🔥#Pandas #NumPy #DataAnalytics #Python #DataScience #LearningInPublic #BCA #CodingJourney
Like Comment
To view or add a comment, sign in
Santosh Anupoju
2mo
Report this post
📊 Day 11/90 — Exploratory Data Analysis (EDA): Start Finding Insights Now that we know how to clean data, it’s time to understand what the data is telling us. Exploratory Data Analysis (EDA) helps analysts discover patterns, trends, and anomalies. ✅ Today’s Focus: • Understanding dataset structure • Checking summary statistics • Finding patterns & outliers • Asking questions from data 🎯 Why this matters: EDA helps you move from raw numbers to meaningful insights that support decision-making. 📌 Practice Tip: Try this in Python: print(df.shape) print(df.info()) print(df.describe()) Look at the output and ask: 👉 What patterns do I see? 👉 Are there unusual values? Great analysts don’t just see data — they understand the story behind it. 💬 Comment “DAY 11” if you’re learning with me. #DataAnalytics #EDA #DataAnalystJourney #Python #LearningInPublic #90DaysChallenge
2 Comments
Like Comment
To view or add a comment, sign in
Divyansh Gulyani
2mo
Report this post
Making Head()s and Tail()s of Your Data 🐼📊 Ever feel overwhelmed when first looking at a massive dataset? You don't need to load the whole thing to get a feel for it. That's where two of my favorite functions in the pandas library come in! df.head(): This function quickly shows you the first 5 rows of your DataFrame by default, providing an initial glimpse into the structure and data types. df.tail(): Conversely, this one displays the last 5 rows, which is super helpful for checking out recently added data or final entries. It's a simple, yet powerful, trick every data professional uses to start their data exploration and analysis journey on the right foot. #DataScience #Python #Pandas #DataAnalytics #DataManipulation #SQL #MachineLearning #LearningJourney# Abhishek kumar # Harsh Chalisgaonkar # SkillCircle™
Like Comment
To view or add a comment, sign in
Abhinava sai Kagana
2mo
Report this post
Day 20 of 150: Data Visualization with Matplotlib Today’s focus shifted from data collection to data storytelling. Raw data is powerful, but visualizing patterns is what makes that data actionable in a professional environment. Technical Focus: • Matplotlib Fundamentals: Implementing the pyplot module to transform structured datasets into visual representations. • Graphing Logic: Creating line graphs and bar charts to identify trends, specifically focusing on axis labeling, legends, and title formatting. • Data Integration: Bridging previous projects by visualizing data stored in CSV and JSON formats to track changes over time. • Customization: Experimenting with figure sizes, colors, and markers to improve the readability and professional quality of the output. Visualizing data is the final bridge between backend processing and meaningful insights. 130 days to go. #Python #DataVisualization #DataScience #Matplotlib #150DaysOfCode #DataAnalytics
Like Comment
To view or add a comment, sign in
Deepak Sharma
2mo Edited
Report this post
Project(Part-01): Data Cleaning - Time Series Analysis with Python. View the full code on GitHub: [https://lnkd.in/gWNFX4Cn] 80% of Forecasting work is just cleaning data. My latest project proves why! I just finished a deep dive into 5 years of hourly electricity demand data (2019–2023). Before I could even think about a SARIMA forecasting model, I had to fix a "broken" dataset. 😶 The Messy Reality: ❌ Missing 200 random hours ❌ 100 duplicate Dates ❌ 30 extreme outliers (4× spikes) ❌ 150 NaN values ❌ Shuffled rows (not time-ordered) 🧐 What I Did (The Pipeline): 1️⃣ Standardized timestamps & sorted chronologically. 2️⃣ Removed duplicates to prevent double-counting demand. 3️⃣ Reconstructed the full hourly range using .asfreq('h'). 4️⃣ Filled the 150 NaN gaps using linear interpolation. 5️⃣ Validated frequency with pd.infer_freq() to ensure a continuous timeline. The Result: A "model-ready" dataset that correctly captures daily and yearly seasonality despite an upward long-term trend. This project proved that for SARIMA, the data preparation is where the forecast is actually won or lost. #DataScience #Python #TimeSeries #CleanData #EnergyAnalytics #SARIMA
Like Comment
To view or add a comment, sign in
Fathima Safiya
2mo
Report this post
Day 09: Beyond the Surface—Mastering Precision Data Selection in Pandas 🐼🎯 Data is only as useful as your ability to find what you need within it. Today, I moved deep into Pandas Indexing, transitioning from simple attribute selection to advanced positional and label-based filtering on Kaggle. Key Technical Takeaways: -The Power of loc vs. iloc: I mastered the distinction between position-based selection (iloc) and label-based selection (loc). A key "gotcha" I learned: while iloc follows standard Python slicing (excluding the end), loc is inclusive. -Logical Slicing: Moving beyond rows and columns, I implemented conditional selection. I can now filter massive datasets using boolean logic. -Dynamic Indexing: I explored how to manipulate the DataFrame index using set_index(), transforming a simple numerical count into meaningful, searchable labels like project titles. -Built-in Selectors: I used isin() and notnull() to my arsenal, allowing for clean, efficient filtering of specific categories and missing values. The ability to "query" data directly in Python is a massive productivity boost! #DataScience #Pandas #Python #Kaggle #DataAnalytics #TechSkills
Like Comment
To view or add a comment, sign in

972 followers

139 Posts

View Profile Follow

Data Analysis with Python: Runners and Income Dataset

More Relevant Posts

Explore content categories