Name: Tab 3: Data Preprocessing with Train/Validation/Test Split and SMOTE | Muskan K M posted on the topic | LinkedIn
Uploaded: 2026-03-18T17:52:46.902Z
Duration: 1 min 13 s
Channel: Muskan K M

Muskan K M

1mo

Tab 3 is live — and this one gets into the real groundwork of any ML pipeline! 🧹 After exploring the data in Tabs 1 & 2, Tab 3 handles end-to-end Data Preprocessing: • Train / Validation / Test split with a dynamic slider • Stratified splitting with a fallback for small class sizes • One-hot encoding for categorical features • Standard scaling for numerical features • Class balance check — with optional SMOTE for imbalanced datasets Clean data in, better models out. 🚀 More tabs coming soon! #DataScience #MachineLearning #DataPreprocessing #SMOTE #Streamlit #Python #FeatureEngineering #BuildingInPublic #DataAnalytics #OpenToWorkhashtag

To view or add a comment, sign in

More Relevant Posts

Manohar Anapuram
1mo
Report this post
I recently worked on a small machine learning project where I tried predicting housing prices using Decision Tree Regression. I used the California Housing dataset and went through the full process — cleaning the data, exploring patterns, building the model, and evaluating how well it performs. It was interesting to see how different factors like income and location influence house prices, and how decision trees handle these relationships. This project gave me a better understanding of how regression models work in practice and the importance of avoiding overfitting while tuning the model. 🔗 Link:- https://lnkd.in/gzwVU_dn #MachineLearning #DataScience #Python #LearningJourney
Like Comment
To view or add a comment, sign in
Siddhesh Kurade
3w
Report this post
Days 68-69 of the #three90challenge 📊 Today I explored NumPy operations — specifically indexing and slicing arrays. After understanding NumPy basics, this step made it easier to access and manipulate data efficiently. What I practiced today: • Accessing elements using indexing • Extracting subsets of data using slicing • Working with multi-dimensional arrays • Performing operations on selected data Example thinking: Instead of looping through data manually, I can directly select and operate on specific parts of an array. Example: import numpy as np arr = np.array([10, 20, 30, 40, 50]) print(arr[1:4]) # Output: [20 30 40] This makes data manipulation faster and more intuitive. From handling data → to controlling it efficiently 🚀 GeeksforGeeks #three90challenge #commitwithgfg #Python #NumPy #DataAnalytics #LearningInPublic #Consistency #Upskilling

1 Comment
Like Comment
To view or add a comment, sign in
Oluwapelumi Foluso
3w
Report this post
Today, I focused on working with NumPy arrays. Building a solid foundation for data manipulation and analysis. Here’s what I practiced: 🔹 Created a 1D array with values from 1 to 15 🔹 Built a 2D array (3×4) filled with ones 🔹 Generated a 3×3 identity matrix 🔹 Explored key array properties like shape, type, and dimensions 🔹 Converted a regular Python list into a NumPy array This session helped me better understand how data is structured and handled in numerical computing. Getting comfortable with arrays is definitely a crucial step toward more advanced data analysis and machine learning tasks. Looking forward to building on this momentum 💡 #AI #MachineLearning #Python #NumPy #DataAnalysis #M4ACE
Like Comment
To view or add a comment, sign in
Jyoshna Chaya
1mo
Report this post
Day 21/30 – Advanced Visualization with Seaborn Today I moved beyond basic charts and explored how to make data actually speak. Seaborn makes it easier to understand patterns, relationships, and distributions without writing complex code. Instead of just plotting data, I focused on: Understanding correlations using heatmaps Visualizing distributions with histograms and KDE plots Comparing categories using boxplots and violin plots What I realized: A good visualization is not about making charts look fancy. It’s about making insights obvious. Two simple examples: A heatmap can quickly show which variables are strongly related instead of checking numbers manually A boxplot can instantly reveal outliers that you might miss in raw data Still learning, but getting better at choosing the right chart instead of just any chart. #Day21 #DataAnalytics #Seaborn #Python #LearningJourney
Like Comment
To view or add a comment, sign in
Nuno Bispo
1mo
Report this post
Don't flatten what naturally has structure. It's tempting to model everything in a single class. Easy to write, easy to read, at least until your data grows. This is where most codebases start, with just one model. But with model composition, each model has a single responsibility. And Pydantic handles nested validation automatically. Structure your models the way your domain is actually structured. The code gets cleaner, the errors get clearer, and reuse becomes obvious. This and other real-world modelling patterns are covered in Practical Pydantic: 👉 https://lnkd.in/eGiB7ZxU Model your domain. Not just your data. #Python #Pydantic #Data #Models #Patterns
Like Comment
To view or add a comment, sign in
Dnyaneshwari Jakore
2w
Report this post
🚀 𝗗𝗮𝘆 𝟯 : 𝗧𝗼𝗱𝗮𝘆 𝗜 𝗲𝘅𝗽𝗹𝗼𝗿𝗲𝗱 𝘀𝗼𝗺𝗲 𝗯𝗮𝘀𝗶𝗰 𝗯𝘂𝘁 𝘃𝗲𝗿𝘆 𝗶𝗺𝗽𝗼𝗿𝘁𝗮𝗻𝘁 𝗳𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝘀 𝗶𝗻 𝗣𝗮𝗻𝗱𝗮𝘀 𝗳𝗼𝗿 𝗱𝗮𝘁𝗮 𝘂𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴 📊 🔍 1. head() Shows the first 5 rows of the dataset df.head() 🔍 2. tail() Shows the last 5 rows df.tail() 📏 3. shape Returns number of rows and columns df.shape ℹ️ 4. info() Provides summary of dataset (data types, null values) df.info() 📊 5. describe() Gives statistical summary (mean, min, max, etc.) df.describe() 📌 6. columns Shows all column names df.columns 💡 Key Learning: Understanding your dataset is the first step before doing any analysis. #Day3 #Pandas #Python #DataAnalytics #LearningJourney #DataExploration

8 Comments
Like Comment
To view or add a comment, sign in
Chibuike Dominion
1mo
Report this post
DSA Tip: Graphs If you think data is always stored in a straight line… think again. Real-world systems are connected, not linear. Use Graphs. They represent data as nodes (points) and edges (connections). No strict order. No single path. Just relationships between data. Used in: - Social networks - Maps & navigation - Recommendation systems Insight: The real power of data isn’t in the elements, it’s in how they are connected. Quick Challenge: How would you represent your friend network as a graph? Drop your answer, I’ll review the best ones. FOLLOW FOR MORE DSA TIPS & INSIGHTS #DSA #Graphs #Python #CodingTips #LearnToCode
2 Comments
Like Comment
To view or add a comment, sign in
Huzaifa Gul
1mo
Report this post
𝐂𝐫𝐚𝐜𝐤𝐞𝐝 𝐭𝐡𝐞 𝐂𝐨𝐝𝐞 𝐨𝐧 𝐇𝐨𝐮𝐬𝐞 𝐏𝐫𝐢𝐜𝐞 𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐨𝐧! I just wrapped up a deep dive into Predictive Modeling using the classic California Housing Dataset. Beyond just fitting a model, I focused on clean data visualization and resolving distribution skews to ensure high-performance results. 𝐊𝐞𝐲 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬: 𝐀𝐥𝐠𝐨𝐫𝐢𝐭𝐡𝐦: Linear Regression 𝐕𝐢𝐬𝐮𝐚𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧: Modernized EDA using Seaborn histplot & probplot 𝐓𝐞𝐜𝐡 𝐒𝐭𝐚𝐜𝐤: Python, Scikit-learn, Pandas, NumPy 𝐕𝐞𝐫𝐬𝐢𝐨𝐧 𝐂𝐨𝐧𝐭𝐫𝐨𝐥: Managed via a clean, professional GitHub workflow. Check out the full implementation and clean repository in first comment below! #MachineLearning #DataScience #AIEngineering #Python #GitHub #LinearRegression #HousePricePrediction

1 Comment
Like Comment
To view or add a comment, sign in
Namit Chaturvedi
3w
Report this post
🗓 7 April 2026 LeetCode Problem #128 – Longest Consecutive Sequence Solved the problem of finding the longest consecutive sequence in an unsorted array. Key insight: use a set for O(1) lookups and only start counting sequences from numbers that are the beginning of a sequence. Takeaways: - Using the right data structure reduces time complexity from O(n²) to O(n). - Avoid redundant work while scanning arrays. - Handle edge cases like empty or single-element arrays efficiently. This problem reinforces how a smart approach beats brute force every time! #LeetCode #Algorithms #Python #DataStructures #ProblemSolving #Coding #TechLearning
Like Comment
To view or add a comment, sign in
Analytics Insight®

91,346 followers
1mo
Report this post
𝐓𝐨𝐩 𝐒𝐞𝐚𝐛𝐨𝐫𝐧 𝐏𝐥𝐨𝐭𝐬 𝐄𝐯𝐞𝐫𝐲 𝐃𝐚𝐭𝐚 𝐀𝐧𝐚𝐥𝐲𝐬𝐭 𝐌𝐮𝐬𝐭 𝐊𝐧𝐨𝐰 𝐢𝐧 𝟐𝟎𝟐𝟔 Data analysts rely heavily on visualizations to understand patterns hidden inside datasets. Python’s Seaborn library simplifies statistical visualization and helps analysts create clear, attractive charts with minimal code. This guide explains the most important Seaborn plots every data analyst should know in 2026. From scatter plots to heatmaps, these visualizations help uncover trends, correlations, and patterns quickly. #DataAnalytics #PythonVisualization #SeabornPlots #DataScience #PythonProgramming #analyticsinsight #analyticsinsightmagazine Read More 👇 https://zurl.co/mvmNa
Like Comment
To view or add a comment, sign in

333 followers

18 Posts

View Profile Connect

More Relevant Posts

Explore related topics

Explore content categories