Anuj Saini’s Post

Predict categories, not numbers. 3 classification models. One free notebook. This notebook covers: → Logistic Regression — the baseline every ML project needs → Decision Trees — visual, interpretable, easy to explain to stakeholders → K-Nearest Neighbors — surprisingly powerful for small datasets → Train/test split and why it matters → Confusion matrix: true positives, false positives, and why accuracy lies → Precision vs Recall — when each one matters more → Model comparison on the same dataset Every model is trained, evaluated, and compared. Not theory slides. Runnable code with real output. If you're prepping for ML interviews, this is the notebook to start with. Free: https://lnkd.in/gCNvPJqS Day 2/7. Yesterday was Web Scraping. Tomorrow: APIs. #MachineLearning #Classification #Python #DataScience #DecisionTree #LogisticRegression #InterviewPrep #FreeResources

To view or add a comment, sign in

More Relevant Posts

Relangi Hemanth Kumar
2w
Report this post
Revisiting Multiple Linear Regression – My ML Learning Journey As part of my ongoing machine learning journey, I revisited Multiple Linear Regression using a car dataset to strengthen my fundamentals and deepen my understanding. 🔍 What I focused on this time: • Practicing exploratory data analysis and understanding feature relationships • Visualizing how variables like HP, VOL, SP, and WT impact MPG • Building multiple models with different feature combinations • Evaluating performance using RMSE and R² score 📊 What I observed: As I added more relevant features, the model performance improved — giving a clearer picture of how multiple factors influence fuel efficiency. 💡 Why this revision mattered: Reworking the same concept helped me move beyond just “knowing” regression to actually understanding how feature selection impacts model performance. 🛠️ Tech Stack: Python | Pandas | NumPy | Matplotlib | Scikit-learn Still learning, still improving — one concept at a time. #MachineLearning #DataScience #Python #Regression #LearningJourney #DataAnalytics
Like Comment
To view or add a comment, sign in
Fahad Khan
2w
Report this post
Starting to understand why Pandas is the first tool every data scientist learns. I built a simple Student Marks Analyzer — nothing fancy, but it clicked something for me. With just a few lines I could: → Build a table from scratch → Explore rows, columns, specific values → Get average, highest and lowest marks instantly 📊 Average: 84.0 | Highest: 95 | Lowest: 70 The interesting part? I didn't write a single formula. No Excel. No manual counting. Just Python doing the heavy lifting in milliseconds. This is exactly what data analysis feels like at the start — small project, but you can already see the power behind it. Still a lot to learn. But this one felt good. #Python #Pandas #DataScience #MachineLearning #AI #100DaysOfCode #PakistanTech
1 Comment
Like Comment
To view or add a comment, sign in
PRAVESH SRIVASTAVA
4w
Report this post
45 Days ML Journey — Day 13: K-Nearest Neighbors (KNN) Day 13 of my Machine Learning journey — exploring K-Nearest Neighbors (KNN), a simple yet powerful algorithm used for classification and regression. Tools Used: Scikit-learn, NumPy, Pandas What is KNN? KNN is a supervised learning algorithm that classifies a data point based on the majority class among its ‘K’ nearest neighbors. Key concepts: K Value → Number of nearest neighbors considered Distance Metric → Measures similarity (e.g., Euclidean distance) Lazy Learning → No training phase; computation happens during prediction How does it work? Choose the number of neighbors (K) Calculate distance from the query point to all data points Pick the K closest neighbors Assign the most common class (for classification) or average (for regression) Why use KNN? Simple and easy to understand No training time required Works well with smaller datasets Challenges: Computationally expensive for large datasets Sensitive to the choice of K and distance metric Affected by feature scaling Code notebook: https://lnkd.in/gQ3HMMBZ Key takeaway: KNN is a beginner-friendly algorithm that relies on similarity, making it intuitive and effective for many real-world problems when tuned properly. #MachineLearning #DataScience #KNN #Python #ScikitLearn #LearningInPublic #MLJourney
Like Comment
To view or add a comment, sign in
Raghul S
3d
Report this post
Why most RAG systems fail in the first week... It’s rarely the LLM's fault. Usually, the "Retrieval" part of RAG is broken. If you’re seeing poor results, check these three specific areas from the infographic: Chunking Strategy: Are you splitting documents effectively or cutting sentences in half? Re-ranking: Are you just taking the top 5 vector results, or are you validating their relevance before passing them to the LLM? Data Processing: Garbage in, garbage out. Are you cleaning your data before indexing? Building a Production-Ready RAG Pipeline requires a holistic view of the data lifecycle. (Great visual breakdown by QuantumEdgeX!) What’s your "must-have" component for a reliable AI agent? #SoftwareEngineering #ArtificialIntelligence #Python #VectorDatabase #RAGPipeline
Like Comment
To view or add a comment, sign in
Mahendra Rathod
4w
Report this post
🚀 Day 36/70 – Random Variables Today I learned about Random Variables in Statistics 📊 A random variable represents the numerical outcome of a random process. 📌 Types of Random Variables 1️⃣ Discrete Random Variable Takes specific values Example: Number of heads in coin toss 2️⃣ Continuous Random Variable Takes any value within a range Example: Height, weight, temperature 📌 Python Example import numpy as np # Discrete random values data = np.random.randint(1, 10, 5) print("Discrete:", data) # Continuous random values data2 = np.random.random(5) print("Continuous:", data2) 📊 Why It’s Important ✔ Forms the base of probability theory ✔ Used in statistical modeling ✔ Helps in predicting outcomes ✔ Important for machine learning Today’s Learning: Random variables help convert real-world uncertainty into numbers 🔥 Day 36 completed 💪 Advancing deeper into statistics! #Day36 #Statistics #Probability #DataAnalytics #Python #LearningInPublic #FutureDataAnalyst #70DaysChallenge
Like Comment
To view or add a comment, sign in
DUGGIRALA JNANA SATYA PRASAD
2w
Report this post
Built end-to-end ML project this week — a Customer Churn Predictor. Here's the mistake that cost me 487 Minutes ⏳ I used GridSearchCV with RandomForest on 440,000 rows. 2 values × 2 values × 1 value = just 4 combinations. But with cv=3, that's 12 full model fits on a massive dataset. Result? Still running after 8 hours. The fix? Switch to RandomizedSearchCV with n_iter=10. Same search space. 10 random combinations instead of exhaustive. Finished in under 5 minutes. The second bug: my XGBoost was giving 50% accuracy — basically random guessing. Root cause: I forgot scale_pos_weight on an imbalanced dataset (250k vs 190k class split). One parameter fix → accuracy jumped to 85%+. Lessons I'm taking forward: → Never use GridSearchCV on large datasets. RandomizedSearchCV first. → Always check class balance before touching any model. → Accuracy is a lying metric on imbalanced data. Use ROC-AUC and F1. Stack: Python · Scikit-learn · XGBoost · Pandas Building toward a full deployment with FastAPI + Streamlit. More updates coming. #MachineLearning #Python #XGBoost #DataScience #MLEngineer #BuildInPublic
Like Comment
To view or add a comment, sign in
Prashant Dagar
4d
Report this post
Moving beyond the "Wrapper": Building a RAG system from the ground up. Scraping data is the easy part. The real challenge begins when transforming raw markdown files and unstructured data into a functional RAG (Retrieval-Augmented Generation) pipeline. Recently, I have been focusing on the "Retrieval" aspect—optimizing how we index and fetch data to ensure the LLM remains grounded in the facts. This involves a fascinating puzzle of vector embeddings, chunking strategies, and prompt engineering. Current progress includes successfully moving from data ingestion to core logic. The next step is fine-tuning the retrieval accuracy. If you’re working on RAG systems, what’s the biggest hurdle you’ve faced so far? #RAG #GenerativeAI #Python #AIEngineering #LLMs
Like Comment
To view or add a comment, sign in
Ernest Tanson MA
6d
Report this post
My market analysis engine runs 17 phases every week. 12 of them are deterministic Python. They finish in 15 seconds. The other 5 involve AI narratives, web searches, and editorial synthesis. They take 35 minutes. The critical insight: the analytical foundation — regime classification, volatility forecasting, tail-risk adjustment, sector dispersion — is locked in before the AI ever touches it. Here's what that means for the numbers you see in my market research: When the engine says "regime shift probability is 47%," I can trace it through the exact computation. The skewness input (-0.43), the kurtosis input (1.11), the Cornish-Fisher formula, the adjusted probability. No black box. No "in my experience." Just auditable math. Part 2 of my framework series drops tomorrow — inside the US equity engine. Have you ever traced a probability back to its actual computation? #QuantFinance #Python #MarketAnalysis #SystematicTrading #Volatility #HMM

2 Comments
Like Comment
To view or add a comment, sign in
Chandra Jyoti Dhakal (CJ)
6d
Report this post
𝐒𝐭𝐨𝐩 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐌𝐨𝐝𝐞𝐥𝐬 𝐔𝐧𝐭𝐢𝐥 𝐘𝐨𝐮 𝐃𝐨 𝐓𝐡𝐢𝐬 𝐅𝐢𝐫𝐬𝐭. Your ML results don’t start with algorithms - they start with clean, model-ready data. 🚀 Here’s a simple 𝗗𝗮𝘁𝗮 𝗣𝗿𝗲-𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴 checklist you can follow every time 👇 𝟭) 𝗜𝗺𝗽𝗼𝗿𝘁 𝘁𝗵𝗲 𝗟𝗶𝗯𝗿𝗮𝗿𝗶𝗲𝘀 📚 Bring in the basics: ✅ NumPy | ✅ Pandas | ✅ (Optional) Matplotlib/Seaborn | ✅ Scikit-learn 𝟮) 𝗜𝗺𝗽𝗼𝗿𝘁 𝘁𝗵𝗲 𝗗𝗮𝘁𝗮𝘀𝗲𝘁 🗂️ Load your data and do quick checks: 🔍 shape, column types, sample rows, basic stats 𝟯) 𝗛𝗮𝗻𝗱𝗹𝗲 𝗠𝗶𝘀𝘀𝗶𝗻𝗴 𝗗𝗮𝘁𝗮 🧩 (𝗜𝗺𝗽𝘂𝘁𝗲𝗿) Missing values can silently hurt accuracy. Fix them with: 📌 Mean/Median (numerical) 📌 Mode (categorical) 𝟰) 𝗘𝗻𝗰𝗼𝗱𝗲 𝗖𝗮𝘁𝗲𝗴𝗼𝗿𝗶𝗰𝗮𝗹 𝗗𝗮𝘁𝗮 🔤➡️🔢 Models need numbers, not text. ✅ 𝗜𝗻𝗱𝗲𝗽𝗲𝗻𝗱𝗲𝗻𝘁 𝗩𝗮𝗿𝗶𝗮𝗯𝗹𝗲𝘀 (𝗫): 𝗢𝗻𝗲-𝗛𝗼𝘁 𝗘𝗻𝗰𝗼𝗱𝗶𝗻𝗴 🧱 Example: City → City_NY, City_LA, City_SF ✅ 𝗗𝗲𝗽𝗲𝗻𝗱𝗲𝗻𝘁 𝗩𝗮𝗿𝗶𝗮𝗯𝗹𝗲 (𝘆): 𝗟𝗮𝗯𝗲𝗹 𝗘𝗻𝗰𝗼𝗱𝗶𝗻𝗴 🎯 Example: Yes/No → 1/0 𝟱) 𝗦𝗽𝗹𝗶𝘁 𝗧𝗿𝗮𝗶𝗻 𝘃𝘀 𝗧𝗲𝘀𝘁 ✂️ Common split: 𝟴𝟬/𝟮𝟬 or 𝟳𝟬/𝟯𝟬 🎯 Train = learn patterns | Test = validate performance 𝟲) 𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 ⚖️ Helps models learn fairly when features have different ranges. 📍 Standardization (Z-score) 📍 Normalization (Min-Max) 🔥 Especially important for: 𝗞𝗡𝗡, 𝗦𝗩𝗠, 𝗞-𝗠𝗲𝗮𝗻𝘀, 𝗟𝗼𝗴𝗶𝘀𝘁𝗶𝗰 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 #MachineLearning #DataScience #FeatureEngineering #DataPreprocessing #Python
Like Comment
To view or add a comment, sign in
Yash Kakade
3w Edited
Report this post
Project: House Price Prediction using #DecisionTreeRegressor Excited to share my recent project where I built a House Price Prediction model using a Decision Tree Regressor. - Key Highlights: Performed data preprocessing and handled categorical features Built a regression model to predict house prices based on multiple factors Achieved: -R² Score: 0.95 -MAE: 35,453 Insights: The model effectively captured non-linear relationships in the dataset Gained practical understanding of decision tree working and hyperparameters Learned how to control overfitting and improve model performance Tech Stack: Python | Pandas | NumPy | Scikit-learn | Matplotlib This project helped me strengthen my understanding of regression techniques and real-world data handling. Next step: Improving performance using ensemble techniques like Random Forest. - Grateful for the guidance from Abhishek Jivrakh Sir during this Project. GitHub Link : [ https://lnkd.in/g8qw8NMF ] #MachineLearning #DataScience #Python #AI #DecisionTree #Regression #Projects #Learning #StudentDeveloper
Like Comment
To view or add a comment, sign in

11,425 followers

339 Posts

View Profile Connect

Anuj Saini’s Post

More Relevant Posts

Explore related topics

Explore content categories