Name: PCA and SVM Pipeline for Student Performance Data | Usman Saeed posted on the topic | LinkedIn
Uploaded: 2026-04-21T14:55:23.809Z
Duration: 52 s
Channel: Usman Saeed

Usman Saeed

Built a complete PCA + ML pipeline on a student performance dataset (395 rows, 33 features). After cleaning, standardizing numeric variables, and encoding categorical fields, I explored relationships with correlation and study-habit vs grade visualizations. Then I implemented PCA end-to-end (covariance matrix → eigenvalues/eigenvectors, scree plots, biplots, and transformation dashboards) to understand variance and reduce dimensionality. Finally, I trained an SVM classifier on the top 5 principal components to predict Pass vs Fail, comparing kernels—best result: Linear SVM, 94.94% test accuracy. #Python #PCA #MachineLearning #SVM #DataScience #scikitlearn #AICadmey

To view or add a comment, sign in

More Relevant Posts

PRAVESH SRIVASTAVA
1mo
Report this post
45 Days ML Journey — Day 12: Support Vector Machine (SVM) Day 12 of my Machine Learning journey — diving into Support Vector Machine (SVM), a powerful algorithm used for both classification and regression tasks. Tools Used: Scikit-learn, NumPy, Pandas What is SVM? SVM is a supervised learning algorithm that finds the optimal hyperplane to separate data points of different classes with the maximum margin. Key concepts: Hyperplane : Decision boundary that separates classes Margin : Distance between the hyperplane and closest data points Support Vectors : Critical data points that define the boundary What if data is not linearly separable? SVM uses the Kernel Trick to transform data into higher dimensions where it becomes separable. Common kernels: Linear Kernel Polynomial Kernel RBF (Radial Basis Function) Kernel Why use SVM? Effective in high-dimensional spaces Works well with clear margin of separation Versatile with different kernel functions Code notebook: https://lnkd.in/gi_4TqUb Key takeaway: SVM is a robust algorithm that focuses on maximizing the margin, making it highly effective for complex classification problems. #MachineLearning #DataScience #SVM #Python #ScikitLearn #LearningInPublic #MLJourney

1 Comment
Like Comment
To view or add a comment, sign in
Moaz El-Morshdy
6d
Report this post
Recently, I worked on a small machine learning project on Fitness Class Attendance Prediction. The goal was to predict whether a member would attend a class or not, using a complete workflow from raw data to final model evaluation. The project included: cleaning inconsistent data formats handling missing values encoding categorical variables preparing preprocessing pipelines training and comparing multiple models I tested: KNN, Decision Tree, SVM, and Naive Bayes What I found interesting was that the “best” model depended on how performance was judged: Naive Bayes gave the best F1-score on the main split SVM gave the highest accuracy Decision Tree looked like the most stable option when the test size changed A good reminder that model selection should not depend on one metric only. Github Repo: https://lnkd.in/d8_ADgY5 Projects like this keep showing me how important it is to combine clean data, correct preprocessing, and thoughtful evaluation to reach a solid conclusion. #MachineLearning #DataAnalytics #Python #ScikitLearn #ClassificationModels #DataScienceProjects
16 Comments
Like Comment
To view or add a comment, sign in
Anuj Saini
1w
Report this post
Your model isn't bad. Your features are. 80% of ML performance comes from feature engineering. Not from picking XGBoost over Random Forest. Not from tuning n_estimators. From the hours you spend turning raw columns into something a model can actually learn from. Free notebook covers: → Polynomial & interaction features (the trick most beginners skip) → Log transforms for skewed distributions → Binning continuous variables (and when it hurts more than it helps) → Date/time feature extraction (hour, day of week, is_holiday) → Categorical encoding beyond one-hot (target, frequency) → Text feature extraction (length, word count, TF-IDF basics) → Scaling strategies (standardize vs normalize vs neither) If your model is stuck at 70% accuracy, the fix is usually in the features, not the algorithm. https://lnkd.in/gj7SgH7y Day 1 of 7. Every day this week: a hands-on notebook. #DataScience #FeatureEngineering #MachineLearning #Python #MLEngineering #InterviewPrep #Pandas #Sklearn

2 Comments
Like Comment
To view or add a comment, sign in
Hydrolix

26,520 followers
3w
Report this post
ODSC AI East 2026 is coming up, and Hydrolix Principal TAM Dan Sullivan will be speaking on a topic that will resonate with teams building decision support for complex systems. In “Spec-Driven Simulation Modeling: Building and Validating Decision Support Models with Python and LLMs,” Dan explores why simulation modeling can be the better fit when prediction alone does not capture queues, constraints, and feedback loops. Register here: https://hubs.la/Q049YCDX0 #Hydrolix #ODSCAIEast #DataScience #MachineLearning #MLOps #SimulationModeling #Python #LLMs
Like Comment
To view or add a comment, sign in
David Elks
3w Edited
Report this post
Back in 1854 Soho to look at John Snow’s cholera data through a new lens. I’ve swapped last week’s static K-Means clustering for a generative Monte Carlo simulation. By letting 500+ "agents" take random walks, occasionally described as 'drunken' because of the wayward pattern, from each victim’s location, the underlying attractor reveals itself. Individually they’re chaotic, but collectively the Broad Street pump becomes statistically inevitable. It’s a simple demo, but this logic scales to everything from AlphaGo to protein folding. We’re using computation and simple rules to find clarity where traditional math reaches its limits. Turns out that embracing the randomness is the fastest way to the signal. https://mcsnow.vercel.app/ #DataScience #MonteCarlo #Simulation #JohnSnow #MCMC #Python #SvelteKit
7 Comments
Like Comment
To view or add a comment, sign in
Maheen -
1mo
Report this post
Exploring data tells powerful stories. Here’s a visualization of the Selling Price Distribution from my recent analysis. The histogram (with KDE) clearly shows a right-skewed distribution, where most vehicles are concentrated in the lower price range, while only a few fall into higher price brackets. Key Insights: • Majority of selling prices lie between 1–6 lakhs • A long tail indicates presence of high-value outliers • The distribution is not normal, which impacts modeling choices This kind of analysis is crucial before applying any machine learning model, as it helps in understanding data behavior and potential preprocessing needs. #DataScience #DataAnalysis #Python #MachineLearning #DataVisualization #LearningJourney
Like Comment
To view or add a comment, sign in
Sohail Abbas
1mo
Report this post
🏥 HEALTHCARE SYSTEM ANALYSIS - COMPLETE 🏥 Project: Bayesian hierarchical modeling of treatment effects Duration: 19 hours computational time Status: Partial results generated KEY FINDINGS: 📊 Treatment effect: Directionally positive 📈 Credible intervals: Wide (needs more data) 🎯 Model convergence: Not achieved 💻 Computational efficiency: Needs improvement NEXT STEPS: 🔧 Refine priors ⚡ Implement variational inference 💪 Try again with better hardware Despite the challenges, we have RESULTS! (Even if "few" means "not enough for publication... yet") #HealthcareSystem #BayesianStatistics #DataScience #Python #ResearchUpdate *progress, not perfection* 📈
Like Comment
To view or add a comment, sign in
Topfolio

75 followers
1w
Report this post
Your dataset has 500 features. Your model only needs 20. The other 480 are noise, redundancy, or both — slowing down training and hurting accuracy. We broke down the 3 algorithms you actually need: Slide 1: PCA — linear, interpretable, fast. Your default. Slide 2: t-SNE — nonlinear, beautiful for visualization, slow on large data Slide 3: UMAP — modern, 10x faster than t-SNE, preserves local + global structure Slide 4: When to use which (decision tree with 4 questions) Slide 5: The common trap: t-SNE axes are NOT features. You can't use them as inputs to a model. Slide 6: Free notebook with all 3 on the same dataset — see the differences yourself Free notebook with side-by-side code for all three: https://lnkd.in/gcbS7m-m If you've been using PCA as a black box, this upgrades you. #MachineLearning #DataScience #PCA #UMAP #DimensionalityReduction #UnsupervisedLearning #Python #Sklearn
Like Comment
To view or add a comment, sign in
Abhishek Gupta
1mo
Report this post
Day 49 of #GeekStreak60: The Math Behind the Matrix! 🧮🔲 Tackled the "Print Diagonally" problem on @GeeksforGeeks today. Key Learning: When traversing a matrix, it's easy to get bogged down in complex boundary checks and nested while loops. But analyzing the actual coordinates reveals a mathematical shortcut: along any anti-diagonal, the sum of the row and column indices (i + j) is always constant! Instead of writing a messy simulation, I used this property to iterate through all possible index sums (from 0 to 2n - 2). By calculating the strict upper and lower bounds for the rows at each sum, the algorithm perfectly extracts the anti-diagonals in pure O(n²) time without a single out-of-bounds check. Algorithms become so much cleaner when you step back and look for the underlying math! 🚀 #geekstreak60 #npci #coding #Algorithms #Python #DataStructures #Matrix #Mathematics #ProblemSolving
Like Comment
To view or add a comment, sign in
Md. Asadozzaman
1w
Report this post
RAG Day 4: Vector Databases and Indexing Excited to share my latest project from Day 4 of my RAG Learning series: Building a Hybrid Search Engine! 🚀 This hands-on mini-project compares semantic-only, keyword-only (BM25), and hybrid retrieval methods using vector databases like FAISS-inspired indices. It incorporates metadata filtering, reciprocal rank fusion, and efficient indexing techniques to handle document search at scale. Key takeaways: Vector databases are crucial for storing and querying embeddings efficiently, balancing speed, accuracy, and memory. Perfect for prototyping RAG systems! #RAG #VectorDatabases #MachineLearning #Python #AI #SearchEngine Source code: https://lnkd.in/gZwinm3i
Like Comment
To view or add a comment, sign in

1,416 followers

26 Posts

View Profile Connect

More Relevant Posts

Explore related topics

Explore content categories