Trained Logistic Regression on Iris Dataset

6mo

Now, I just trained a Logistic Regression model on the Iris dataset! Quite Interesting 🤔 Steps I followed: •Loaded & explored data •Split into train/test sets •Scaled features •Trained & predicted 1. Import libraries import pandas as pd from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score 2. Load dataset iris = load_iris() 3.Create combined DataFrame df = pd.DataFrame(data=iris.data, columns=iris.feature_names) df["target_class"] = iris.target 4. Split input (X) and output (y) X = df[iris.feature_names] y = df["target_class"] 5. Split into train and test X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 6. Scale the data scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) 7. Train logistic regression model model = LogisticRegression() model.fit(X_train_scaled, y_train) 8. Predict and check accuracy y_pred = model.predict(X_test_scaled) acc = accuracy_score(y_test, y_pred) print(f"Model Accuracy: {acc * 100:.2f}%") Logistic Regression predicts probabilities for categories, not logistics! #MachineLearning #DataScience #Python #LogisticRegression #IrisDataset #LMES #UPTOR #MohanSivaraman

To view or add a comment, sign in

More Relevant Posts

Joachim Schork
5mo
Report this post
Binary logistic regression is a powerful statistical method used to model the relationship between a binary target variable and one or more predictor variables. It's commonly used in situations where the outcome is categorical, such as predicting whether a customer will buy a product (yes/no) or whether a patient has a disease (present/absent). When properly implemented, binary logistic regression offers several benefits: ✔️ Accurate Predictions: It helps in making precise predictions about binary outcomes, which can be crucial for decision-making in fields like marketing, healthcare, and finance. ✔️ Variable Impact: By examining the coefficients, you can understand the impact of different variables on the probability of the outcome. ✔️ Flexibility: Logistic regression can handle multiple predictor variables, making it suitable for complex models. However, if not handled correctly, there can be drawbacks: ❌ Overfitting: Using too many predictor variables can cause the model to become overly complex and perform poorly on new data. ❌ Misinterpretation: The model's output probabilities need careful interpretation, as incorrect conclusions can lead to faulty decisions. ❌ Assumption Dependence: Logistic regression relies on certain assumptions, such as the linear relationship between predictors and the log odds of the target variable. Violating these assumptions can reduce model reliability. To implement binary logistic regression in practice, you can use these tools: 🔹 R: Use the glm() function from the base package to fit a logistic regression model. The ggplot2 package can be used to visualize the data and model predictions. 🔹 Python: Use LogisticRegression from the scikit-learn library to create a logistic regression model. Libraries like matplotlib or seaborn can help visualize the results. The visualization of this post demonstrates a logistic regression model, showcasing how the model predicts probabilities for a binary outcome based on a continuous predictor. It includes a curve that represents the predicted probabilities, helping us understand how changes in the predictor variable affect the likelihood of different outcomes. If you want to dive deeper into this topic and learn how to apply these techniques in R, check out my online course on Statistical Methods in R. It covers binary logistic regression and many other related topics in detail. See this link for additional information: https://lnkd.in/d-UAgcYf #research #rprogramminglanguage #datavisualization #ggplot2 #dataanalytic #bigdata #datavisualization #dataanalytics #tidyverse
2 Comments
Like Comment
To view or add a comment, sign in
Abbas Mansour Leila
5mo
Report this post
Mean imputation is a straightforward method for handling missing values in numerical data, but it can significantly distort the relationships between variables. By replacing missing values with the mean of observed data, this approach artificially reduces variability and weakens correlations, leading to misleading results in analysis. Why Does Mean Imputation Distort Correlations? ❌ No variability in imputed values: Mean imputation assigns the same value to all missing entries, failing to reflect the natural variability of the data. ❌ Weakens relationships: The imputed values introduce artificial uniformity that diminishes or masks the strength of correlations between variables. ❌ Biased downstream analyses: Statistical tests and predictive models relying on the data's correlation structure may produce inaccurate or unreliable results. A Visual Example: The attached image demonstrates how mean imputation can disrupt correlations between variables. The black points represent the original observed values, showing the natural relationship between variables X1 and X2. The red and green points represent imputed values for X1 and X2, respectively, placed at their mean values. This disrupts the overall pattern, artificially aligning the data along the mean and weakening the true correlation between X1 and X2. A Better Approach: To preserve relationships between variables, predictive mean matching is a superior alternative. This method selects observed values closest to the predicted value for a missing entry, maintaining variability and the natural correlation structure. When combined with multiple imputation, it also accounts for uncertainty, ensuring more robust and reliable results for downstream analyses. For a detailed explanation of mean imputation, its drawbacks, and better alternatives, check out my full tutorial here: https://lnkd.in/d2vfiSmf Sign up for my free email newsletter to stay informed about data science, statistics, Python, and R. More info: http://eepurl.com/gH6myT #RStats #Data #datasciencetraining #Python #StatisticalAnalysis #DataAnalytics
Like Comment
To view or add a comment, sign in
KUDUM VEERABHADRAIAH
5mo
Report this post
🚀 Day 1 Pandas Mini Project — Smart Universal Data Loader (Python + Pandas + NumPy) Today, I’m excited to share a mini-project that I built to simplify the process of working with different datasets during Data Analysis. The goal of this project is to make it easier to load, explore, clean, and export datasets across different formats — something we do every day in Data Science. What this mini project does: ✅ Loads CSV, Excel, JSON files ✅ Shows dataset shape & summary ✅ Identifies missing and duplicate values ✅ Supports basic cleaning and column formatting ✅ Saves the cleaned dataset back to file Skills improved today: Data handling with Pandas Array operations with NumPy Data cleaning workflows Understanding dataset structure Writing reusable functions This is just Day 1 — excited to continue and build more advanced features in the upcoming days. Suggestions & feedback are welcome 🤝 #Day1 #100DaysOfData #Pandas #Python #DataAnalysis #DataCleaning #NumPy #DataScience #MachineLearning #Analytics #LinkedInLearning #PowerBI #Ai #EDA
Like Comment
To view or add a comment, sign in
Joachim Schork
5mo
Report this post
When building predictive models, overfitting is a common challenge. Shrinkage methods, such as Ridge Regression, Lasso, and Elastic Net, help address this by adding a penalty term to the objective function during training, which discourages large coefficients. This results in more robust models that generalize better to new data. ✔️ Ridge Regression shrinks coefficients by penalizing their squared values, making it great when all features matter. ✔️ Lasso forces some coefficients to zero, effectively performing feature selection, ideal when only a subset of features is important. ✔️ Elastic Net combines the strengths of Ridge and Lasso, providing a balance between regularization and feature selection, especially useful when features are correlated. However, there are some challenges to consider: ❌ Loss of interpretability: Excessive shrinkage can make it difficult to interpret the model coefficients, as important predictors may have their effects reduced. ❌ Tuning required: These methods require careful tuning of hyperparameters (like λ and α) to find the right balance between bias and variance. Poor tuning can lead to either underfitting or overfitting. ❌ Not suitable for all situations: In some cases, simpler models like OLS (Ordinary Least Squares) might perform just as well or even better, especially when the sample size is large and multicollinearity isn’t an issue. 🔹 In R: Use the glmnet package to apply Ridge, Lasso, and Elastic Net. 🔹 In Python: Leverage the sklearn.linear_model module for all three shrinkage methods. Want to dive deeper into these methods and learn how to apply them? Join my online course on Statistical Methods in R, where we explore this and other key techniques in further detail. Take a look here for more details: https://lnkd.in/d-UAgcYf #datascience #pythonforbeginners #analysis #package
11 Comments
Like Comment
To view or add a comment, sign in
Statistics Globe

14,952 followers
6mo
Report this post
When building predictive models, overfitting is a common challenge. Shrinkage methods, such as Ridge Regression, Lasso, and Elastic Net, help address this by adding a penalty term to the objective function during training, which discourages large coefficients. This results in more robust models that generalize better to new data. ✔️ Ridge Regression shrinks coefficients by penalizing their squared values, making it great when all features matter. ✔️ Lasso forces some coefficients to zero, effectively performing feature selection, ideal when only a subset of features is important. ✔️ Elastic Net combines the strengths of Ridge and Lasso, providing a balance between regularization and feature selection, especially useful when features are correlated. However, there are some challenges to consider: ❌ Loss of interpretability: Excessive shrinkage can make it difficult to interpret the model coefficients, as important predictors may have their effects reduced. ❌ Tuning required: These methods require careful tuning of hyperparameters (like λ and α) to find the right balance between bias and variance. Poor tuning can lead to either underfitting or overfitting. ❌ Not suitable for all situations: In some cases, simpler models like OLS (Ordinary Least Squares) might perform just as well or even better, especially when the sample size is large and multicollinearity isn’t an issue. 🔹 In R: Use the glmnet package to apply Ridge, Lasso, and Elastic Net. 🔹 In Python: Leverage the sklearn.linear_model module for all three shrinkage methods. Want to dive deeper into these methods and learn how to apply them? Join my online course on Statistical Methods in R, where we explore this and other key techniques in further detail. See this link for additional information: https://lnkd.in/ed7XyXQm #Rpackage #coding #Statistical
Like Comment
To view or add a comment, sign in
PRATHAM RAJ
5mo
Report this post
This project leverages the Python data science stack to analyze and predict salary trends. 🐼 Pandas and 🔢 NumPy handle data loading, cleaning, and numerical operations. The ‘re’ library extracts 💰 salary figures and normalizes skill data. 🤖 Scikit-learn powers the predictive model — using train_test_split, OneHotEncoder, and Linear Regression for accurate salary prediction. 📊 Matplotlib and Seaborn visualize insights through bar charts and heatmaps. Finally, 💡 itertools identifies top-earning skill pairs, revealing valuable combinations that drive higher salaries. Link-https://lnkd.in/gV6nVcCz
Like Comment
To view or add a comment, sign in
Areesha Ejaz
6mo
Report this post
In the world of data analytics, EDA is the first and most crucial step toward uncovering meaningful insights. Before building models or running predictions, EDA helps us: -Understand data structure -Detect patterns, trends & relationships -Identify missing values & outliers -Formulate hypotheses for deeper analysis Recently, I worked on an EDA project where I: -Cleaned and prepared raw datasets -Analyzed distribution, correlation & variance -Visualized key metrics using Python (Pandas, Matplotlib, Seaborn) -Extracted valuable insights to guide decision-making
Like Comment
To view or add a comment, sign in
Vikas Girigoswami
6mo
Report this post
Today, I explored one of the most exciting steps in the data analytics process — 𝐄𝐃𝐀 (𝐄𝐱𝐩𝐥𝐨𝐫𝐚𝐭𝐨𝐫𝐲 𝐃𝐚𝐭𝐚 𝐀𝐧𝐚𝐥𝐲𝐬𝐢𝐬). Before building models or visualizations, understanding your data deeply is the real game-changer. Here’s what I practiced 👇 📊 𝐒𝐭𝐞𝐩𝐬 𝐢𝐧 𝐄𝐃𝐀: 1️⃣ Checking data types and structure 2️⃣ Summarizing statistics (df.describe()) 3️⃣ Identifying missing values & outliers 4️⃣ Visualizing patterns using Matplotlib & Seaborn 5️⃣ Understanding correlations and trends 💡 Insight: EDA isn’t just about numbers — it’s about asking the right questions and letting data tell its story. Tools used: Python | Pandas | Seaborn | Matplotlib 𝐇𝐚𝐬𝐡𝐭𝐚𝐠𝐬: #DataAnalytics #PythonForData #EDA #ExploratoryDataAnalysis #DataScience #AnalyticsJourney #LearnDataAnalytics #Pandas #Seaborn #DataVisualization
Like Comment
To view or add a comment, sign in
Goodluck Merua
6mo
Report this post
Model Adequacy Checking - Project OBJECTIVE The goal of this project is to use the Mean Absolute Error (MAE), Mean Squared Error (MSE) and Coefficient of determination (r2) to check for the best suitable measure for model adequacy checking. BACKGROUND OF THE STUDY We’ve seen some Data Scientist using MSE and MAE for model adequacy checking and it is really wrong because, it only shows us the deviation of the estimated value from their true value in their squares & absolute values respectively, we know the closer it is to zero the less variation in the dataset, yet there is no single/range value(s) that serve(s) as the yardstick on which to base our objective decision from; Hence, the Coefficient of Determination (r2-score) which measures the variation of the Target variable as explained (caused) by the feature variables, if it has value greater than ( ≥ ) 80%, it is a good fit for model adequacy and can be used for the goodness of fit test for the dataset understudy. DATASET INFORMATION Rows (observations): 200 Columns (features): 3 Target variable: Sales Features include: TV, Radio & Newspaper STEPS PERFORMED Data Cleaning Model Trained: Linear Regression Data set plot: Radio Ad spend Vs Sales Adequacy Evaluation: MSE = 27.6, MAE = 4.6 & r2-score achieved the performance of 10.70. CONCLUSION From the Scatter plot, we saw weak relationship between Radio Advertisement spending and Sales and it was hugely supported by r2-score that showed only 10.7% relationship between Radio ad spending and Sales, it indicates that only 10.7% of the sales was attributed to Radio Ad. I’m happy to share this Model Adequacy Checking project I worked on. Check it out here: https://lnkd.in/d4eVzhiv #DataScience #Utiva #Python #MachineLearning

1 Comment
Like Comment
To view or add a comment, sign in
Joachim Schork
6mo
Report this post
Quantile regression is a valuable tool for analyzing the relationship between variables, especially when data is not evenly distributed or has outliers. Unlike traditional linear regression, which focuses only on the mean, quantile regression allows us to predict different points across the distribution of the target variable. Challenges: ❌ Compared to linear regression, quantile regression requires more computational power and can be harder to interpret for non-experts. ❌ Larger sample sizes might be needed to achieve stable and reliable quantile estimates, especially for extreme percentiles. ❌ The model's results might be less intuitive if you are accustomed to traditional regression techniques, which could limit ease of communication. Advantages: ✔️ Quantile regression helps to explore trends at various quantiles, offering a more detailed picture of your data. ✔️ This method is highly effective for non-normal data, particularly when there are outliers or heavy tails. ✔️ It is ideal for situations where extreme values or various percentiles are as important as the central trend. How to handle quantile regression in practice: 🔹 R: Use the quantreg package to apply quantile regression. The rq() function allows you to specify the quantiles you're interested in. 🔹 Python: In Python, statsmodels provides quantile regression with the QuantReg() function to analyze different percentiles of your data. The attached visualization is based on a Wikipedia image (link: https://lnkd.in/e7eYbpPg) and illustrates quantile regression lines at various percentiles, showing how predicted values differ across the distribution. To explain this topic in further detail, I collaborated with Micha Gengenbach to create a comprehensive tutorial: https://lnkd.in/eyb_DFr8 Curious to learn more about statistics and R programming? Join my online course, "Statistical Methods in R." See this link for additional information: https://lnkd.in/d-UAgcYf #statistical #analysis #database
13 Comments
Like Comment
To view or add a comment, sign in

1,938 followers

130 Posts

View Profile Follow

Trained Logistic Regression on Iris Dataset

More Relevant Posts

Explore related topics

Explore content categories