How to Handle Missing Values in Python with Pandas

6mo

⚡ Handling Missing Values in Python Here’s a simple breakdown of the different methods used in Python 1️⃣ Identify Missing Values df.isnull() # Shows True/False for missing values df.isnull(). sum() # Counts missing values per column You can also check the percentage of missing data: (df.isnull(). sum() / len(df)) * 100 2️⃣ Remove Missing Values If the missing values are few or not significant: df.dropna() # Removes rows with missing values df.dropna(axis=1) # Removes columns with missing values Use this when deleting data doesn’t affect the dataset’s overall quality. 3️⃣ Fill Missing Values When you can’t afford to drop data, fill the missing values instead. 🔹 Constant value df['Name']. fillna('Unknown', inplace=True) 🔹 Mean / Median / Mode (for numerical columns) df['Age']. fillna (df['Age']. mean(), inplace=True) df['Salary'].fillna (df['Salary'].median(), inplace=True) 🔹Forward or Backward Fill (for time series) df.fillna(method='ffill', inplace=True) # Forward fill df.fillna(method='bfill', inplace=True) # Backward fill 4️⃣ Advanced Imputation Using Models For large datasets or when data is missing in patterns: from sklearn.impute import SimpleImputer imputer = SimpleImputer(strategy='mean') df[['Age', 'Salary']] = imputer.fit_transform(df[['Age', 'Salary']]) Other strategies: 'median,' 'most_frequent,' and 'constant.' 🔹 Best Practices Use mean/median for numerical data. Use mode or “Unknown” for categorical data. Drop columns if more than 40–50% of the data is missing. Always analyze the pattern of missingness before deciding. #Python #DataCleaning #Pandas #DataAnalytics

4 Comments

Vansh Choubey 6mo

Great insights, Priyanka! Your breakdown of handling missing values in Python is clear and practical. This will definitely help many in their data analysis journey. Thanks for sharing!

1 Reaction

Tanmaya Kumar Pani 6mo

You’ve explained this so beautifully! Perfect mix of clarity and depth 👌

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Abhishek Kumar
6mo
Report this post
𝐏𝐲𝐭𝐡𝐨𝐧 𝐓𝐢𝐩 𝐨𝐟 𝐭𝐡𝐞 𝐃𝐚𝐲: 𝐌𝐚𝐬𝐭𝐞𝐫𝐢𝐧𝐠 𝐟𝐢𝐥𝐭𝐞𝐫(), 𝐦𝐚𝐩(), 𝐚𝐧𝐝 𝐬𝐨𝐫𝐭𝐞𝐝() When working with Python, these three built-in functions can make your data processing cleaner, faster, and more readable. Let’s break them down 👇 ↘️ map() - Transform Data - Applies a function to every element in an iterable. Example: numbers = [1, 2, 3, 4, 5] squares = list(map(lambda x: x**2, numbers)) print(squares) Output = [1, 4, 9, 16, 25] ✅ Use when you want to modify or compute new values from existing data. ↘️ filter() - Extract What You Need - Filters elements based on a condition (function that returns True or False). Example: numbers = [1, 2, 3, 4, 5] evens = list(filter(lambda x: x % 2 == 0, numbers)) print(evens) Output = [2, 4] ✅ Use when you need to keep only specific elements that match a condition. ↘️ sorted() - Arrange Your Data - Sorts elements of an iterable (ascending by default). You can customize it using the key parameter. data = [("apple", 3), ("banana", 1), ("cherry", 2)] sorted_data = sorted(data, key=lambda x: x[1]) print(sorted_data) Output = [('banana', 1), ('cherry', 2), ('apple', 3)] ✅ Use when you need to organize your data in a specific order. 💡 In short: map() → Transform filter() → Select sorted() → Organize Mastering these three can make your Python code not just functional but elegant. #Python #CodingTips #DataScience #DataEngineering #Learning
Like Comment
To view or add a comment, sign in
Amr Salah Abd ElGhany
5mo
Report this post
Writing a for-loop in Python to process a list of data? You might be adding hours to your script's runtime without even knowing it. I see this all the time: analysts use loops for data transformations that could be done in a fraction of the time. The bottleneck isn't your computer's speed—it's how you're talking to it. The secret to faster data processing in Python is vectorization. Instead of processing each element one-by-one in a loop, vectorized operations apply a function to an entire dataset simultaneously, leveraging optimized, pre-compiled C code under the hood. Let's take a common task: calculating the square of every number in a list. The Slow Way (Loop): python import pandas as pd data = pd.Series(range(1, 1000001)) squared_list = [] for num in data: squared_list.append(num ** 2) The Fast Way (Vectorized): python import pandas as pd data = pd.Series(range(1, 1000001)) squared_list = data ** 2 The vectorized approach isn't just cleaner—it's dramatically faster. For a million rows, the loop might take ~150ms, while the vectorized operation can finish in ~2ms. That's a 98.7% reduction in processing time! This principle applies across pandas and NumPy: Use df['column'].str.upper() instead of looping with .upper() Use df['column'].apply(function) instead of a for-loop (.apply is optimized) Use NumPy's universal functions (np.log, np.sqrt) on arrays Adopting a vectorized mindset is a game-changer for efficiency. Have you ever refactored a slow loop into a vectorized operation? What was the performance boost like? Share your story below! #Python #DataAnalysis #Pandas #CodingTips #DataScience
1 Comment
Like Comment
To view or add a comment, sign in
Jaume Boguñá
6mo
Report this post
Lambda functions aren’t just for one-liners They can make your Python data workflows cleaner and faster. Here are 5 Python lambda tricks every data scientist should master: 1 → Writing concise one-off functions instead of full def blocks 2 → Using lambdas with map(), filter(), sort() for clean transformations 3 → Capturing variables in closures for pipeline convenience 4 → Combining lambdas with pandas and NumPy for inline operations 5 → Choosing when not to use lambdas (for readability & debugging) Read it here: https://lnkd.in/djGG3rfW

5 Python Lambda Tricks Every Data Scientist Should Master python.plainenglish.io
Like Comment
To view or add a comment, sign in
Shri Ram Mishra
6mo
Report this post
𝗦𝘁𝗼𝗽 𝗺𝗮𝗻𝘂𝗮𝗹𝗹𝘆 𝗰𝗼𝗹𝗼𝗿𝗶𝗻𝗴 𝗰𝗲𝗹𝗹𝘀 𝗶𝗻 𝗚𝗼𝗼𝗴𝗹𝗲 𝗦𝗵𝗲𝗲𝘁𝘀! 🎨 If you're still spending time clicking Format -> Conditional Formatting on your reports, there's a better way. By leveraging Python, you can transform your data reporting from a manual chore into a fully automated workflow. 𝗪𝗵𝘆 𝘂𝘀𝗲 𝗣𝘆𝘁𝗵𝗼𝗻 𝗳𝗼𝗿 𝗚𝗼𝗼𝗴𝗹𝗲 𝗦𝗵𝗲𝗲𝘁𝘀 𝗳𝗼𝗿𝗺𝗮𝘁𝘁𝗶𝗻𝗴? ✅ 𝗦𝗰𝗮𝗹𝗮𝗯𝗶𝗹𝗶𝘁𝘆: Format 100 columns as easily as you format 1. ✅ 𝗖𝗼𝗻𝘀𝗶𝘀𝘁𝗲𝗻𝗰𝘆: Eliminate "oops" moments. Get pixel-perfect reports every time. ✅ 𝗔𝗱𝘃𝗮𝗻𝗰𝗲𝗱 𝗥𝘂𝗹𝗲𝘀: Implement complex logic that the UI can't handle. 𝗖𝘂𝗿𝗶𝗼𝘂𝘀 𝗵𝗼𝘄 𝗶𝘁 𝘄𝗼𝗿𝗸𝘀 𝗶𝗻 𝗽𝗿𝗮𝗰𝘁𝗶𝗰𝗲? 𝗛𝗲𝗿𝗲'𝘀 𝘁𝗵𝗲 𝘀𝗶𝗺𝗽𝗹𝗲, 𝟯-𝘀𝘁𝗲𝗽 𝗹𝗼𝗴𝗶𝗰: 𝗦𝘁𝗲𝗽 𝟭: 𝗖𝗼𝗻𝗻𝗲𝗰𝘁 𝘁𝗼 𝗬𝗼𝘂𝗿 𝗦𝗵𝗲𝗲𝘁 Use the gspread library to securely authenticate and open your target Google Sheet, all from your Python script. 𝗦𝘁𝗲𝗽 𝟮: 𝗗𝗲𝗳𝗶𝗻𝗲 𝗬𝗼𝘂𝗿 𝗥𝘂𝗹𝗲 𝗶𝗻 𝗖𝗼𝗱𝗲 This is the magic. You create a "rule" object that specifies three things: 𝗧𝗵𝗲 𝗥𝗮𝗻𝗴𝗲: Which cells do you want to format? (e.g., 'C2:C100') 𝗧𝗵𝗲 𝗖𝗼𝗻𝗱𝗶𝘁𝗶𝗼𝗻: What logic should trigger the format? This is written as a standard Google Sheets formula. (e.g., =C2 < B2 to check if the value went down). 𝗧𝗵𝗲 𝗙𝗼𝗿𝗺𝗮𝘁: What should the cell look like? (e.g., a red background). 𝗦𝘁𝗲𝗽 𝟯: 𝗔𝗽𝗽𝗹𝘆 𝘁𝗵𝗲 𝗥𝘂𝗹𝗲 Your script sends these instructions to the Google Sheets API, and your sheet is formatted instantly. Here's a simplified code example of a rule that colors a cell red if its value is less than the cell to its left: 𝗣𝘆𝘁𝗵𝗼𝗻 𝗰𝗼𝗱𝗲 # 𝙸𝚖𝚙𝚘𝚛𝚝 𝚝𝚑𝚎 𝚗𝚎𝚌𝚎𝚜𝚜𝚊𝚛𝚢 𝚏𝚘𝚛𝚖𝚊𝚝𝚝𝚒𝚗𝚐 𝚝𝚘𝚘𝚕𝚜 𝚏𝚛𝚘𝚖 𝚐𝚜𝚙𝚛𝚎𝚊𝚍_𝚏𝚘𝚛𝚖𝚊𝚝𝚝𝚒𝚗𝚐 𝚒𝚖𝚙𝚘𝚛𝚝 * # 𝟷. 𝙳𝚎𝚏𝚒𝚗𝚎 𝚝𝚑𝚎 𝚛𝚞𝚕𝚎 𝚛𝚞𝚕𝚎 = 𝙲𝚘𝚗𝚍𝚒𝚝𝚒𝚘𝚗𝚊𝚕𝙵𝚘𝚛𝚖𝚊𝚝𝚁𝚞𝚕𝚎( 𝚛𝚊𝚗𝚐𝚎𝚜=['𝙲𝟸:𝙲𝟷𝟶𝟶'], # 𝚃𝚑𝚎 𝚛𝚊𝚗𝚐𝚎 𝚝𝚘 𝚊𝚙𝚙𝚕𝚢 𝚏𝚘𝚛𝚖𝚊𝚝𝚝𝚒𝚗𝚐 𝚝𝚘 𝚋𝚘𝚘𝚕𝚎𝚊𝚗𝚁𝚞𝚕𝚎=𝙱𝚘𝚘𝚕𝚎𝚊𝚗𝚁𝚞𝚕𝚎( 𝚌𝚘𝚗𝚍𝚒𝚝𝚒𝚘𝚗=𝙱𝚘𝚘𝚕𝚎𝚊𝚗𝙲𝚘𝚗𝚍𝚒𝚝𝚒𝚘𝚗('𝙲𝚄𝚂𝚃𝙾𝙼_𝙵𝙾𝚁𝙼𝚄𝙻𝙰', ['=𝙲𝟸 < 𝙱𝟸']), 𝚏𝚘𝚛𝚖𝚊𝚝=𝙲𝚎𝚕𝚕𝙵𝚘𝚛𝚖𝚊𝚝(𝚋𝚊𝚌𝚔𝚐𝚛𝚘𝚞𝚗𝚍𝙲𝚘𝚕𝚘𝚛=𝙲𝚘𝚕𝚘𝚛(𝟷, 𝟶, 𝟶)) # 𝚁𝚎𝚍 𝚋𝚊𝚌𝚔𝚐𝚛𝚘𝚞𝚗𝚍 ) ) # 𝟸. 𝙰𝚍𝚍 𝚝𝚑𝚎 𝚛𝚞𝚕𝚎 𝚝𝚘 𝚢𝚘𝚞𝚛 𝚠𝚘𝚛𝚔𝚜𝚑𝚎𝚎𝚝 𝚊𝚗𝚍 𝚜𝚊𝚟𝚎 𝚛𝚞𝚕𝚎𝚜 = 𝚐𝚎𝚝_𝚌𝚘𝚗𝚍𝚒𝚝𝚒𝚘𝚗𝚊𝚕_𝚏𝚘𝚛𝚖𝚊𝚝_𝚛𝚞𝚕𝚎𝚜(𝚠𝚘𝚛𝚔𝚜𝚑𝚎𝚎𝚝) 𝚛𝚞𝚕𝚎𝚜.𝚊𝚙𝚙𝚎𝚗𝚍(𝚛𝚞𝚕𝚎) 𝚛𝚞𝚕𝚎𝚜.𝚜𝚊𝚟𝚎() By placing this logic inside a loop, you can apply similar rules across an entire report in seconds. A little bit of code saves hours in the long run. It's a true "set it and forget it" solution. What's the most tedious task you've automated in your workflow? Share below! 👇 #Python #GoogleSheets #DataAutomation #Automation #DataAnalytics #Reporting #BusinessIntelligence #Gspread
Like Comment
To view or add a comment, sign in
Pravin Gurung
6mo
Report this post
Handling Missing Data in Python — Made Simple! 🐍 Ever opened a dataset and saw NaN or blank cells everywhere? 😩 Don’t worry — missing values (or nulls) are super common in data analysis. But the good news? Python makes handling them really easy! 💪 Here are some quick and simple ways 👇 🔹 1️⃣ Check for missing values df.isnull().sum() 👉 Helps you see how many null values each column has. 🔹 2️⃣ Remove missing values df.dropna(inplace=True) 👉 Use this if you’re okay losing those rows. 🔹 3️⃣ Fill missing values df['column_name'].fillna(value, inplace=True) 👉 Replace nulls with mean, median, mode — or even 0! Example: df['Age'].fillna(df['Age'].mean(), inplace=True) 🔹 4️⃣ Forward or backward fill df.fillna(method='ffill', inplace=True) 👉 Fills missing values using previous data. 💡 Pro tip: Never just drop or fill without understanding why the data is missing — sometimes, missing info can tell its own story! 📊 Data cleaning = foundation of good analysis. Because if your data is messy, your insights will be too! 😉 #Python #DataAnalysis #DataCleaning #Pandas #LearningData #DataScience #MachineLearning #CareerInData #UAEJobs #DataWithProvin
Like Comment
To view or add a comment, sign in
Python Indore (PyIndore)

44 followers
5mo Edited
Report this post
🧠 Just tried out a really cool Python library — toon_format — and it’s a hidden gem for anyone working with LLMs or large data payloads. It’s a compact, human-readable serialization format that reduces context size by 30–60% vs JSON, while staying super easy to read and use. What makes it awesome: • YAML-like indentation • CSV-style tabular arrays • Minimal syntax, array validation • Python 3.8+ and battle-tested • Fully compatible with the official TOON spec ⚙️ Install it: pip install toon_format (or uv add toon_format) Quick example 👇 from toon_format import encode, decode encode({"name": "Alice", "age": 30}) # name: Alice # age: 30 encode([{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]) # [2,]{id,name}: # 1,Alice # 2,Bob We have been using it to trim LLM context payloads — super efficient and still human-friendly. 🚀 If you deal with JSON or token limits, give toon_format a try ! I have shared repository link in first comment. #Python #OpenSource #LLM #Serialization #AI #Developers #MachineLearning #GenAI

1 Comment
Like Comment
To view or add a comment, sign in
Izairton Vasconcelos
6mo Edited
Report this post
🧱 QUICK TIP #2 – “Numbered Shelf: How the ARRAY Keeps Everything in Place (Python)” 1️⃣ Structure Name Array (indexed collection) 2️⃣ Goal Store many elements side by side and access each one by index with constant-time speed. 3️⃣ Everyday Analogy Think of a numbered shelf (0, 1, 2, 3…). You know exactly where each item is and grab it by position — no searching required. 4️⃣ Common Use Cases Math & stats operations Signals, images, numeric series Data buffers & batch processing Compact memory layouts 5️⃣ Technical Advantage Random access O(1) by index: read/write is blazing fast when you know the position. 6️⃣ Technical Drawback Often fixed-size in low-level arrays; inserting in the middle is costly (shifts elements). 7️⃣ Python Example (lists used like arrays) # Arrays in Python – quick demo ⚡ # Author: Izairton Oliveira de Vasconcelos import time import numpy as np print("=== ARRAYS IN PYTHON ===") # Basic list demo prices = [9.90, 12.50, 7.80, 15.00] prices[2] = 8.10 prices.insert(1, 10.00) print("Prices:", prices) # NumPy fast math a = np.array([1, 2, 3, 4]) print("a * 10 ->", a * 10) print("mean ->", a.mean(), "std ->", a.std()) # Middle insertion cost arr = list(range(100000)) t0 = time.perf_counter() arr.insert(len(arr)//2, -1) print(f"Insertion in middle took {(time.perf_counter()-t0)*1000:.3f} ms") print(prices, first_item) 8️⃣ Efficient Numeric Example (NumPy) import numpy as np a = np.array([1, 2, 3, 4]) b = a * 10 # vectorized op mean_val = a.mean() # fast stats print(b, mean_val) # [10 20 30 40], 2.5 9️⃣ When to Use When you need fast index-based access and batch numeric operations with contiguous memory. 🔟 ✨ The Aha Moment (Resumo do Pulo do Gato) “An array is your program’s numbered shelf: every item has a fixed address, so you grab exactly what you need instantly.”
3 Comments
Like Comment
To view or add a comment, sign in
Shuvendu Parida
6mo
Report this post
🚀 Just built my own Python data type using OOP & magic methods! We all know Python gives us built-in types like int, float, and list... But what if we could design our own — that behaves just like them? 🤯 That’s exactly what I did with PyMatrixEngine 🧠 I built a custom Matrix data type that supports operations such as: ➕ Addition (A + B) ➖ Subtraction (A - B) ✖️ Multiplication (A * B) 🔁 Transpose & Determinant All powered by Python’s magic methods (__add__, __mul__, __str__, and friends) 🪄 And here’s the cool part — If you input something that doesn’t form a valid matrix, this datatype automatically checks it and raises a clean, readable error. No more silent shape mismatches or confusing bugs ✅ You can simply drop the file in your project and start using it: from matrix import Matrix A = Matrix([[1,2],[3,4]]) B = Matrix([[5,6],[7,8]]) print(A + B) print(A * B) print(A.determinant()) It’s a fun deep-dive into Object-Oriented Programming (OOP) and Python’s hidden superpowers: magic methods ✨ 🧩 GitHub Repo → https://lnkd.in/gkrheMQS Would love to hear — what’s the coolest custom data type you’ve ever built in Python? #Python #OOP #MagicMethods #Coding #Matrix #Learning #PythonProjects #Developers #PythonTips

GitHub - ShuvenduParida/PyMatrixEngine github.com

1 Comment
Like Comment
To view or add a comment, sign in
Alejandro Paúl Aldas
5mo
Report this post
#python Problem Description Data collected from various sources frequently suffers from inconsistencies, such as: Inconsistent Capitalization: "New York", "new york", "NEW YORK". Leading/Trailing Whitespace: " Texas ", "California\t". These issues make accurate filtering, grouping, and analysis impossible. def clean_data(data_list): """ Cleans and standardizes a list of strings by: 1. Removing leading/trailing whitespace. 2. Converting the string to Title Case (e.g., 'new york' -> 'New York'). Args: data_list (list): A list of strings to be cleaned. Returns: list: A new list with the standardized strings. """ cleaned_list = [] for item in data_list: # 1. Strip whitespace and 2. Convert to Title Case cleaned_item = item.strip().title() cleaned_list.append(cleaned_item) return cleaned_list # --- Example Usage --- # Real-world inconsistent data raw_locations = [ " texas ", "new york ", " california\n", "NEW YORK", "TEXAS" ] # Apply the solution standardized_locations = clean_data(raw_locations) print(f"Original Data: {raw_locations}") print(f"Standardized Data: {standardized_locations}")
Like Comment
To view or add a comment, sign in
Rafsan Ahmed Al
6mo
Report this post
Daily Progress — Python + SQL Today, I practiced some simple but powerful Python basics: 🔹 input() function 🔹 Data types 🔹 String conversion and len() 🔹 Conditional statements (if-else) 🔹 String replacement using .replace() Python Practice: Age = 323 x = len(str(Age)) if x < 2: print("Yes, it is") else: print(x, "- No, it is not") phone_Num = "123-475-886" print(phone_Num.replace("-", "/")) And to make sure I don’t forget SQL while learning Python, I practiced one query today as well WITH CTE AS ( SELECT s.Category, SUM(s.Price * s.Quantity) AS Total_Revenue, SUM(s.Quantity * p.Cost) AS Total_Cost FROM Sales AS s INNER JOIN Products AS p ON s.Product_Name = p.Product_Name GROUP BY s.Category ) SELECT Category, Total_Revenue, Total_Cost, (Total_Revenue - Total_Cost) AS Total_Profit FROM CTE ORDER BY Category, Total_Profit DESC; Lesson learned: Small consistent practice builds long-term mastery. #Python #SQL #DataAnalytics #LearningJourney #DataScience #AI #CodingEveryday #Consistency
Like Comment
To view or add a comment, sign in

870 followers

45 Posts

View Profile Connect

How to Handle Missing Values in Python with Pandas

More Relevant Posts

Explore content categories