ML Coding Habit: Why X and Y?

I had a funny moment while coding today. 😁 Everything was going great. Data was ready. Then I typed the usual code: X = df.drop("price", axis=1) y = df["price"] And my brain just stopped: "Wait... why do we always use X and y? Who made this a rule?" 🤨😭 I looked it up, and it is actually just simple math 📐: Capital X = A big group of data (many columns). Small y = Just one thing (one column). Big data gets a big letter. Small data gets a small letter. 🤯 Do I have to use them? No. I could use normal names like features and price. But am I going to do that? Nope! Tomorrow I will use X and y again. It is just a habit now! 🌚 It is funny how in ML, the biggest questions come from the smallest things. 😅 Be honest: Do you use normal names, or do you also just use X and y? 👇 #MachineLearning #Python #DataScience #CodingLife #SimpleCode

To view or add a comment, sign in

More Relevant Posts

Shreya BJ
2w
Report this post
🚀 From Confusion → Clarity: My Approach to “Sort the People” (LeetCode) Today I solved the Sort the People problem, and instead of jumping straight into sorting tricks, I focused on building clarity step by step 👇 🔍 My Thought Process: First, I paired each name with its corresponding height using a list (like a mini mapping). Then, I sorted this list based on height. Since the problem required descending order, I simply reversed the sorted list. Finally, I extracted only the names in the correct order. 💡 Key Learning: Sometimes, the simplest approach is the best one. Instead of overcomplicating with advanced data structures, breaking the problem into smaller transformations made it super manageable. 🧠 What this improved for me: Understanding how to use lambda for sorting Confidence in handling paired data (name + value problems) Thinking in steps rather than jumping to optimization ⚡ Code Strategy in One Line: Pair → Sort → Reverse → Extract Consistency > Speed. One problem at a time. 💪 📈 If you're also grinding DSA, keep going — progress compounds! #DSA #LeetCode #CodingJourney #Python #ProblemSolving #Consistency #TechGrowth #100DaysOfCode #WomenInTech #FutureEngineer
Like Comment
To view or add a comment, sign in
Data Brighten

721 followers
1mo
Report this post
Here’s a professional and engaging LinkedIn caption for your post: --- Turning messy data into meaningful insights is an art—and the right tools make all the difference. 📊✨ From confusing default plots to clean, decision-ready visuals, mastering Python & Seaborn can completely transform how you communicate data in the boardroom. And understanding concepts like Cross Join (Cartesian Product) isn’t just theory—it’s the foundation of smarter analytics. Stop guessing. Start visualizing. Start influencing decisions. 🚀 #DataAnalytics #Python #Seaborn #DataVisualization #BusinessIntelligence #AnalyticsJourney #DataScience #SQL #LearningEveryday #CareerGrowth #TechSkills #DataDriven #LinkedInLearning
Like Comment
To view or add a comment, sign in
Sunny ..
1w
Report this post
Mistakes are part of the process Day 7 – #100DaysOfCode ⏰ Time Spent: 2 hours ⚒️ What I Did: * Yesterday I have learned one way to read scatter plots , today I Practiced that . * Modified my function to make it reusable * Plotted relationships between complaints and aggregated features I observed only these two trends: * log(x) vs y → logarithmic trend [ y = a · log(x) + b ] * log(x) vs log(y) → power law [ y = k · xᵃ ] But then I realized something important… I was plotting a sum on the x-axis, which naturally increases the values which created misleading patterns. So I switched to mean,but the trends disappeared. Which implies no relation but I'll experiment with few other transformations before I conclude that --- 🚪 Links: * Repo: [https://lnkd.in/g7zsMygp) --- 🧠 Learning: Bad feature choice can create fake patterns. 📌 Closing: Should try to work on these things when I am not tired ( Mornings / After a nap ) #DataScience #DataAnalytics #Python #CodingJourney
Like Comment
To view or add a comment, sign in
Nabila T.
1mo
Report this post
One habit I’ve started building when working with data: Before writing any logic, I always run: df.head() df.info() df.describe() It sounds obvious. But early on, I skipped this step. I would immediately start writing transformations. And later realize things like: columns were strings instead of numbers values had unexpected formats missing data existed where I didn’t expect it Now I try to slow down and understand the data first. It saves a surprising amount of time later. 💡 Data engineering lesson I’m learning: Understanding the data is often more important than writing the code. #DataEngineering #Python #Pandas
Like Comment
To view or add a comment, sign in
Priscilla Nzula
6d
Report this post
🔷A simple train test split is not always enough. I learned this the hard way when my model looked great on paper and struggled on real data. 📌Here is what nobody tells you about splitting data properly. The basic split gives you two sets. Training and testing. That works for simple projects. But what if you need to tune your model? You test different settings, pick the best one, and evaluate on the test set. The problem is that you have now indirectly used the test set to make decisions. It is no longer a fair judge. This is where a three way split becomes important. 🔹X_train, X_temp, y_train, y_temp = train_test_split( X, y, test_size=0.3, random_state=42 ) 🔹X_val, X_test, y_val, y_test = train_test_split( X_temp, y_temp, test_size=0.5, random_state=42 ) Now you have three sets. Training set. The model learns here. 70 percent of your data. Validation set. You tune and compare models here. 15 percent. Test set. You evaluate the final model here. Once. Never again. 15 percent. The test set is sacred. You look at it exactly one time at the very end. One more thing that most people miss. Always stratify your split when your target column is imbalanced. 🔹train_test_split(X, y, stratify=y, test_size=0.2) stratify=y makes sure both sets have the same proportion of each class. Without it you might end up with a training set that barely sees the minority class and a model that has no idea it exists. The split is not a formality. It is a decision that shapes every result that follows. Get it right before you touch anything else. ❓What split ratio do you use for your projects and why? #DataScience #MachineLearning #Python
Like Comment
To view or add a comment, sign in
Harish Pasumarthi
1mo
Report this post
Ever opened a dataset and thought… “why is this so messy?” 😅 Same here. While working with Pandas, I realized data cleaning isn’t complicated — it’s just a few powerful steps repeated smartly 👇 🧹 Missing values? → isna() to find them, fillna() or dropna() to handle them 🔁 Duplicate rows? → drop_duplicates() and move on 🔧 Wrong data types breaking your logic? → astype() fixes it in seconds 🧼 Messy text (extra spaces, weird formats)? → str.strip() and str.lower() clean it instantly 📊 Before trusting data? → info() and value_counts() give a quick reality check Good analysis starts with clean data first. That simple shift has already changed how I look at datasets. Still learning, but this is one of the most useful lessons so far. #DataAnalytics #Python #Pandas #DataCleaning #LearningJourney
Like Comment
To view or add a comment, sign in
Abhay Patil
1mo Edited
Report this post
Data Cleaning is only half the battle. Are you Engineering your features? In Step 2 of the Machine Learning pipeline, many beginners stop at data cleaning. While removing NaNs and dropping irrelevant rows is essential, the real magic happens during Feature Engineering. While working on my recent Price Prediction project, I realized that the raw data rarely tells the full story. To build a high-performing model, you have to create features that capture the "why" behind the numbers. I focused on three key areas for this preprocessing script: 📈 Moving Averages: Capturing trends over time. 📉 Volatility: Accounting for market fluctuations and risk. 🕒 Lag Features: Giving the model a "memory" of previous price points. Clean data gets you a working model. Engineered features get you a winning model. Check out the snippet of my preprocessing logic below! 👇 #MachineLearning #DataScience #Python #FeatureEngineering #PredictiveAnalytics
Like Comment
To view or add a comment, sign in
Kaustav Banerjee
1w
Report this post
Data View v1 is live. No hype — just a clean build. Built with Streamlit, Python, Pandas, NumPy, Seaborn, and Matplotlib, this app cuts through the noise and gets straight to the point: understanding your data without wasting time. What it handles right now: • Upload your dataset • Quick data overview • Basic cleaning • Statistical insights • Correlation analysis • Visuals — bar, histogram, pie It’s not flashy. It’s functional. And it works. But this is just the opening move. Now your move 👇 • What’s one feature you’d add next? • What would make you actually use this daily? • What’s missing? Be direct. I’m listening. I’ll be shipping a sharper version every Monday — better features, tighter experience, smarter analysis. No excuses, just iterations. Because good products aren’t guessed — they’re built, tested, and refined. live demo --> https://lnkd.in/gXda-aZs #BuildInPublic #DataScience #Streamlit #Python #KeepBuilding
Like Comment
To view or add a comment, sign in
Dr. Wakini Njoki
1w
Report this post
𝗛𝗶𝗴𝗵 𝗦𝗰𝗵𝗼𝗼𝗹 𝗔𝗹𝗴𝗲𝗯𝗿𝗮, 𝗯𝘂𝘁 𝗺𝗮𝗸𝗲 𝗶𝘁 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴. I had a massive "déjà vu" moment this week. I started on 𝗟𝗶𝗻𝗲𝗮𝗿 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻, and while the equation looks fancy: 𝘆=𝗕𝟬+𝗕𝟭𝗫 ,It ’s really just a sophisticated version of the 𝘆 = 𝗺𝘅 + 𝗯 we all learned in high school. Turns out, I actually needed that math after all 😊 ! It feels like a full-circle moment, but the path here has been a steep climb. I’ve spent the last month deep in the "un-glamorous" essentials: 𝗗𝗮𝘁𝗮 𝗖𝗹𝗲𝗮𝗻𝗶𝗻𝗴: Realizing no model works until you’ve tackled the messiness of real-world data. 𝗩𝗶𝘀𝘂𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Discovering that a good plot is the only way to see if the math actually fits the story. Building the foundation in Python has been challenging, but moving from "how to code" to "how to predict" is where the magic happens. Now, the real task begins: narrowing down a final project topic that solves a real-world problem. #HealthDataScience #MachineLearning #Python #CareerTransition #LinearRegression
Like Comment
To view or add a comment, sign in
Satyam Rana
2w
Report this post
The best way to learn ML? Stop using libraries. I challenged myself to build linear regression using only NumPy and pandas. No sklearn. No model.fit(). No shortcuts. The result: 3 days of debugging, 4 major bugs, and one working model. I documented everything in a new Medium article: The math behind gradient descent (explained simply) Why feature scaling saved my model from exploding The dummy variable trap I almost fell into How I fixed R² = -6660 (yes, negative six thousand) If you're learning data science, this will save you hours of frustration. Read the full story: [https://lnkd.in/gvEu6-fM] Code on GitHub: [https://lnkd.in/gQUsAfzD] #DataScience #MachineLearning #Python #100DaysOfCode
2 Comments
Like Comment
To view or add a comment, sign in

315 followers

36 Posts

View Profile Follow

ML Coding Habit: Why X and Y?

More Relevant Posts

Explore content categories