Towhid Al Jihad’s Post

3mo

The Art of Proxies: When the Data You Need Doesn't Exist In textbooks, you always find the perfect dataset to answer your question. However, in the real world, the exact data point you need rarely exists. This is where great analysts distinguish themselves from average ones: the ability to identify and validate Proxy Variables. A proxy is a variable that is not directly relevant but serves in place of an unobservable or immeasurable variable. For example: - Can't measure "customer happiness"? Use NPS scores or support ticket volume as a proxy. - Can't measure "economic activity" in a region with poor reporting? Researchers have successfully used satellite imagery of nighttime lights as a proxy. The skill lies not just in finding the proxy but also in understanding its limitations. A proxy is a shadow of the truth, not the truth itself. Always caveat your findings accordingly. What’s the most creative proxy you’ve ever used in research . #DataAnalytics #DataScience #Python #Coding #TechTips #DataCommunity #BusinessIntelligence #Strategy #DataDriven #MarketResearch #CriticalThinking #DecisionMaking

To view or add a comment, sign in

More Relevant Posts

Srinivasa Reddy Chagamreddy
2mo
Report this post
🚀 I recently worked on implementing the K-Nearest Neighbors (KNN) algorithm and evaluated how well it predicts unseen data. Instead of focusing only on theory, I wanted to understand what actually happens when a model learns from data. First, I prepared the dataset and applied feature scaling because KNN depends on distance. Then I trained the model and tested it on new data. Results: • ✅ Training Accuracy — 95.83% • ✅ Testing Accuracy — 96.66% Since both accuracies are almost equal, the model is not memorizing the dataset. It is actually identifying patterns and making reliable predictions. What I learned • Distance plays a crucial role in prediction • Scaling directly affects model performance • High accuracy alone is not enough — comparison matters • Simple algorithms can still be very powerful This project helped me understand the difference between a model that learns and a model that just fits data. 👉 Building strong fundamentals with simple algorithms is important in Machine Learning. #MachineLearning #DataScience #KNN #Python #LearningByDoing
Like Comment
To view or add a comment, sign in
Jeorge Silva
3mo
Report this post
10 models, 1 loop, and a lot of learning 🚀 One of the most fascinating takeaways from my Data Science journey so far is that there’s no such thing as a "silver bullet." An algorithm that shines in one scenario might fail miserably in another. Today, I decided to automate my benchmarking process. Instead of manually testing algorithms one by one, I built a Python workflow that pre-processes the data and evaluates 10 different models at once using Cross-Validation. 💡 Key learnings from this experiment: The power of Pipelines: It keeps the code clean and ensures pre-processing steps (like KNNImputer or MinMaxScaler) are locked to the model, preventing data leakage. Interpretation matters: Seeing a negative score for Lasso while Random Forest hit 0.92+ gave me immediate insight into the nature of my dataset (likely highly non-linear). Efficiency: Automating repetitive tasks frees up time for the actual analysis and tuning. Seeing that final list of scores print out brings a huge sense of satisfaction! On to the next steps. 📈 Question for the network: Do you usually test a wide range of models in the initial phase, or do you skip straight to the heavy hitters (like XGBoost/LightGBM)? 👇 #DataScience #MachineLearning #Python #ScikitLearn #Coding #LearningJourney
4 Comments
Like Comment
To view or add a comment, sign in
Ajinkya Dhormale
2mo
Report this post
Task 03 – Decision Tree Classifier Built a Decision Tree model to predict customer purchasing behavior based on demographic and behavioral features. Included data preprocessing, encoding categorical variables, model training, and performance evaluation. Key Skills: Machine Learning • Decision Trees • Classification • Feature Engineering • Model Evaluation #prodigyinfotech #datascienceintern #python #dataanalytics
Like Comment
To view or add a comment, sign in
Kareem Albaghdadi
2mo
Report this post
Lately, I’ve been taking time to refresh and strengthen my knowledge and one thing is clearer than ever: business and technology are deeply connected. They can’t be separated. Technology is not just about writing code. It’s about creating impact. Python is a powerful language that goes far beyond development. It enables automation, advanced data analysis, and intelligent systems that help businesses reach new milestones. Recently, I’ve been training and testing machine learning models, and it’s inspiring to see how raw data can turn into insights, predictions, and smarter decisions. The more I grow technically, the more I understand how important it is to think from both perspectives: developer and business. If you’re exploring this field, I encourage you to dive deeper into powerful Python libraries like scikit-learn, XGBoost, matplotlib and many others that can elevate your machine learning projects. Continuous learning. Continuous improvement. 🚀 #Python #MachineLearning #BusinessAndTechnology #DataScience #Innovation #ContinuousLearning
Like Comment
To view or add a comment, sign in
M Israr Ali
2mo
Report this post
𝐖𝐡𝐲 𝐫𝐚𝐧𝐝𝐨𝐦 𝐭𝐫𝐚𝐢𝐧_𝐭𝐞𝐬𝐭_𝐬𝐩𝐥𝐢𝐭 𝐟𝐚𝐢𝐥𝐬 𝐨𝐧 𝐢𝐦𝐛𝐚𝐥𝐚𝐧𝐜𝐞𝐝 𝐝𝐚𝐭𝐚 (𝐚𝐧𝐝 𝐡𝐨𝐰 𝐬𝐭𝐫𝐚𝐭𝐢𝐟𝐲=𝐲 𝐟𝐢𝐱𝐞𝐬 𝐢𝐭) I ran train_test_split on a dataset with 90% class 0 and 10% class 1. My test set came back with 95% class 0. Random isn't always fair. It's just random. If your classes are imbalanced, you need 𝐬𝐭𝐫𝐚𝐭𝐢𝐟𝐲=𝐲 in scikit-learn's split. It preserves the class distribution across both sets. Code: X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, stratify=y, random_state=42 ) Without it, you risk evaluating on a test set that doesn't reflect the real distribution. Your metrics lie. This doesn't matter if your classes are balanced. And it won't save you if your data is fundamentally broken. But for imbalanced problems, it's the difference between noise and signal. #DataScience #Python #MachineLearning #AIEngineer #DataAnalysis #ScikitLearn #MLBasics
Like Comment
To view or add a comment, sign in
M Israr Ali
2mo
Report this post
I scaled my features before splitting the data. Validation accuracy hit 𝟗𝟒%. I thought I nailed it. I didn't. I was leaking test information into training. When you 𝐟𝐢𝐭 𝐚 𝐬𝐜𝐚𝐥𝐞𝐫 𝐨𝐧 𝐭𝐡𝐞 𝐞𝐧𝐭𝐢𝐫𝐞 𝐝𝐚𝐭𝐚𝐬𝐞𝐭, it learns statistics from data your model will be evaluated on. The 𝐭𝐞𝐬𝐭 𝐬𝐞𝐭 𝐢𝐬𝐧'𝐭 𝐭𝐫𝐮𝐥𝐲 𝐮𝐧𝐬𝐞𝐞𝐧 anymore. 𝐓𝐡𝐞 𝐟𝐢𝐱: fit preprocessing only on training data, then transform test data using those learned parameters. This won't ruin a simple project. But in real Machine Learning work, it's the difference between honest evaluation and quietly inflated metrics. Subtle leakage compounds. #DataScience #Python #MachineLearning #AIEngineer #DataAnalysis #ScikitLearn #MLBasics
Like Comment
To view or add a comment, sign in
kowsalya sankar
2mo
Report this post
Day 7 – Going Beyond Accuracy in Machine Learning Today I focused on properly evaluating my Logistic Regression model for the Telco Customer Churn project. Instead of looking at accuracy alone, I analyzed precision, recall, and F1-score to better understand model performance. 🔎 Key Results: • Accuracy: 98% • Precision (Churn): 98% • Recall (Churn): 94% This means the model correctly identifies most churn customers while keeping false predictions very low. One important learning today: High accuracy is good — but understanding how the model makes correct and incorrect predictions is what truly matters. This journey is helping me move from just writing code to actually understanding machine learning concepts. #MachineLearning #DataScience #ModelEvaluation #LogisticRegression #LearningJourney #Python
Like Comment
To view or add a comment, sign in
Saideep Chavan
3mo
Report this post
🚀 Built a Machine Learning model using Logistic Regression to solve a classification problem and evaluate model accuracy. This hands-on experience helped me understand data preprocessing, target selection, model training, and performance evaluation. A valuable step in strengthening my practical knowledge of Machine Learning and Data Analytics. 📊🤖 #MachineLearning #LogisticRegression #DataScience #DataAnalytics #Python #MLProjects #LearningByDoing #CareerGrowth 🔗 Explore my Machine Learning projects, including Logistic Regression models, on GitHub: https://lnkd.in/dStyq8q9

1 Comment
Like Comment
To view or add a comment, sign in
YASH SHAKYA
3mo
Report this post
🔹 Title First Machine Learning Model | Linear Regression Implementation in Python This video demonstrates the implementation of my first Machine Learning model — Linear Regression, built using Python to understand the complete end-to-end ML pipeline. 🔍 Technical overview of what’s shown in the video: • Loading and exploring the dataset • Feature–target separation (X, y) • Data preprocessing and validation • Training a Linear Regression model • Learning the relationship: y = β₀ + β₁x + ε • Generating predictions on input data • Interpreting model outputs and behavior Through this project, I focused on understanding how model parameters (coefficients and intercept) are learned, how linear relationships are modeled, and how data quality impacts predictions. 📌 Key learnings: • Supervised learning fundamentals • Model training vs prediction • Importance of clean, well-structured data • Translating mathematical concepts into working code This project represents my first practical step into Machine Learning, building a strong foundation before moving on to advanced models and optimization techniques. #MachineLearning #LinearRegression #SupervisedLearning #Python #DataScience #MLProjects #ModelTraining #LearningByDoing
Like Comment
To view or add a comment, sign in
Mohammed Abdelrazek
3mo
Report this post
02 #AI_ML_for_Process_Engineering TRADING SPREADSHEETS FOR PYTHON WHY THE SWITCH FROM EXCEL? 🚀 SCALABILITY: Python handles millions of sensor readings that would crash a standard spreadsheet. 🛠️ REPRODUCIBILITY: Unlike Excel, where one accidental keystroke can break a formula across 10,000 rows, Python logic is explicit, modular, and verifiable. 📊 AUTOMATED INSIGHT: With one line of code (.describe()), I can instantly get the mean, std dev, and ranges for every tag in a massive dataset. Considering The 80/20 rule is real ==> 80% of AI is "Data Cleaning. Python is the power tool that makes that 80% manageable, allowing us to stop "firefighting" data and start interrogating it for insights. [Question for the Engineers] What is the largest dataset you’ve ever tried to open in a spreadsheet? Did it survive, or did you see the "Not Responding" screen of death? 😅 #DJ2Tech #ProcessEngineering #Industry40 #DigitalTransformation
Like Comment
To view or add a comment, sign in