Name: Autonomous Data Science System for CSV and Excel Files | Zeeshan Abbasi posted on the topic | LinkedIn
Uploaded: 2026-03-17T17:10:56.424Z
Duration: 24 s
Channel: Zeeshan Abbasi

Zeeshan Abbasi

1mo

Built an autonomous data scientist system that takes any CSV or Excel file and runs a complete machine learning pipeline without any manual configuration. It automatically detects the target column, identifies whether the problem is binary classification, multiclass classification, or regression, cleans and engineers features, trains and compares five models, evaluates with appropriate metrics, and generates a full HTML report with charts and plain English explanations powered by a local LLM. Everything is accessible through a FastAPI backend and a web interface built from scratch. Stack: Python, scikit-learn, XGBoost, FastAPI, HuggingFace Transformers, Plotly. #MachineLearning #DataScience #Python #MLOps #ArtificialIntelligence #FastAPI #ScikitLearn #DataEngineering #AutoML #OpenSource #SoftwareEngineering #DeepLearning #Programming #Tech #Innovation

4 Comments

Nadeem Ahmad 1mo

Any situation where it fallback ? because generalizing datascience work is not fully effecient

To view or add a comment, sign in

More Relevant Posts

Devashish Potnis
1mo
Report this post
🚀 Day 50/100 – Python, Data Analytics & Machine Learning Journey 🤖 Module 3: Machine Learning 📚 Today’s Learning: Supervised Learning – Regression Algorithm 2: Decision Tree Regression Today I explored Decision Tree Regression, a supervised machine learning algorithm used to predict continuous values by learning decision rules from the data. Unlike linear models, Decision Tree Regression works by splitting the dataset into smaller subsets based on feature values, forming a tree-like structure. Each split helps the model make more precise predictions by grouping similar data points together. One of the key advantages of Decision Tree Regression is its ability to capture non-linear relationships in the data and provide easy-to-understand decision rules. This algorithm is widely used in applications such as price prediction, demand forecasting, risk analysis, and customer behavior modeling. The learning journey continues as I explore more regression algorithms and their real-world applications. 📌 Code & Notes: https://lnkd.in/dmFHqCrK #100DaysOfPython #MachineLearning #AIML #Python #LearningInPublic #DataScience
Like Comment
To view or add a comment, sign in
Jaxson Baylor
1mo
Report this post
I built a small Python tool that eliminates redundant SVD computations. Instead of recomputing the same matrix over and over, it: computes once stores the result returns it in microseconds on repeat calls even after restarting the program On my machine: NumPy SVD: ~3.3s ZeroFold (first call): ~2.4s ZeroFold (repeat): ~0.008s All results are bitwise identical (0.00e+00 diff). This isn’t a new SVD algorithm. It’s a persistent, zero-config caching layer for linear algebra. It only helps when inputs are reused — but that pattern shows up more often than expected: fixed weight matrices in ML pipelines adapter / LoRA swapping repeated PCA on stable datasets iterative experimentation workflows Recent updates (v0.1.3): cache keys include dtype + memory order atomic disk writes (crash-safe) version-based cache invalidation visibility into whether results came from memory or disk If you’re recomputing the same decompositions in your pipeline, this might save real time and cost. Repo: https://lnkd.in/gt3pX_x9 Would be interested to hear where this breaks / where it helps. #AI #MachineLearning #Python #DataScience #MLOps #DeepLearning #SoftwareEngineering #OpenSource #TechInnovation #Optimization #Computing #BuildInPublic
Like Comment
To view or add a comment, sign in
Keitmaan Bhatti
1mo
Report this post
Most people jump straight into Machine Learning… without understanding the foundation behind it. That foundation? 👉 NumPy If you can’t work efficiently with arrays, you’ll struggle with data, models, and performance. NumPy is what powers: ✔ Data manipulation ✔ Mathematical computations ✔ High-performance operations in Python Here’s a breakdown of the core NumPy concepts every developer should know 👇 —from array creation to linear algebra and file handling. 💡 Truth: You don’t need 100 libraries to start in AI. You need strong fundamentals. #Python #NumPy #DataScience #MachineLearning #AI #ArtificialIntelligence #PythonProgramming #Coding #Programming #Developers #AIEngineer #DataAnalytics #DeepLearning #LearnPython #SoftwareEngineering #TechCareer #CodingJourney #100DaysOfCode
Like Comment
To view or add a comment, sign in
Devashish Potnis
1mo
Report this post
🚀 Day 62/100 – Python, Data Analytics & Machine Learning Journey 🤖 Module 3: Machine Learning 📚 Today’s Learning: Unsupervised Learning Algorithm 3: PCA Today, I explored the fundamentals of Unsupervised Learning a type of machine learning where models work with unlabeled data to discover hidden patterns and structures. I learned about PCA (Principal Component Analysis), a powerful dimensionality reduction technique used to reduce the number of features while preserving the most important information in the dataset. It transforms the original variables into a new set of uncorrelated variables called principal components. PCA works by identifying directions (principal components) where the data varies the most. The first principal component captures the maximum variance, followed by the second, and so on. This helps in simplifying complex datasets, improving model performance, and reducing computation time. The learning journey continues as I explore more regression algorithms and their real-world applications. 📌 Code & Notes: https://lnkd.in/dmFHqCrK #100DaysOfPython #MachineLearning #AIML #Python #LearningInPublic #DataScience
Like Comment
To view or add a comment, sign in
Nomidl

755 followers
1mo
Report this post
In machine learning, data quality directly impacts model performance. One often overlooked issue is the presence of outliers — data points that significantly deviate from the dataset. If not handled properly, outliers can: • Skew data distribution • Increase model error • Reduce prediction accuracy This article explores practical techniques for outlier detection: • Box Plot for visual identification • IQR method for statistical boundaries • Z-Score for deviation analysis • Isolation Forest for scalable anomaly detection These methods are essential for improving model reliability and performance in real-world applications. Read more info: https://lnkd.in/dnDV3_zg #MachineLearning #DataScience #ArtificialIntelligence #Python #SoftwareEngineering #Analytics
Like Comment
To view or add a comment, sign in
Devashish Potnis
1mo
Report this post
🚀 Day 51/100 – Python, Data Analytics & Machine Learning Journey 🤖 Module 3: Machine Learning 📚 Today’s Learning: Supervised Learning – Regression Algorithm 3: Support Vector Regression (SVR) Today, I explored Support Vector Regression (SVR), a powerful supervised machine learning algorithm used for predicting continuous values. SVR works by finding the best-fit line (or hyperplane) that not only fits the data but also keeps the prediction error within a defined margin (epsilon). It focuses on maintaining a balance between model complexity and prediction accuracy. SVR is widely used in applications like stock price prediction, demand forecasting, and time-series analysis. The learning journey continues as I explore more regression algorithms and their real-world applications. 📌 Code & Notes: https://lnkd.in/dmFHqCrK #100DaysOfPython #MachineLearning #AIML #Python #LearningInPublic #DataScience
Like Comment
To view or add a comment, sign in
Berkeley Global

1,957 followers
1mo
Report this post
Go deeper into the science behind machine learning. In Modern Statistical Prediction and Machine Learning, study the theory and practice of predictive modeling, from regression and regularization to boosting and supporting vector machines. Work with real data, write Python code and learn how to balance model performance with computational efficiency. Learn more ➡️ https://bit.ly/4sAzMW1
Like Comment
To view or add a comment, sign in
Darren Niedermeyer, MBA
3w Edited
Report this post
I tested GPT-4.1-mini vs Claude 3.5 Sonnet on SEC 10-Q filings using a custom Python benchmarking framework. What I expected: → Differences in reasoning quality What I found: → The biggest performance gap came from how each model extracted the data Not analysis. Not math. Input fidelity. This is a big deal for anyone building: • AI-driven financial reporting • Portfolio benchmarking tools • Automated KPI systems Because if your extraction is off, everything downstream is noise. Garbage in → confident garbage out. #ArtificialIntelligence #GenerativeAI #AIInFinance #DataStrategy #BusinessIntelligence #Python #DataScience #PrivateEquity #PortfolioPerformance #ValueCreation #DigitalTransformation #FinTech #LLMEvaluation #Automation

1 Comment
Like Comment
To view or add a comment, sign in
Tharindu Priyaneth Gamage
1mo
Report this post
🚀 Built a Spam Detection App using Machine Learning I developed a machine learning model that can classify messages as Spam or Not Spam with ~96% accuracy. 🔍 What I implemented: • Text preprocessing and cleaning • TF-IDF feature extraction • Naive Bayes classification • Interactive web app using Streamlit 💡 You can test it by entering any message and instantly getting predictions. 🛠️ Tech Stack: Python | Pandas | Scikit-learn | Streamlit 🎥 Demo attached below 📂 GitHub: https://lnkd.in/ghuwihsk This project helped me understand the complete ML pipeline — from data preprocessing to deployment. #MachineLearning #Python #DataScience #AI #Projects

4 Comments
Like Comment
To view or add a comment, sign in
Hrishikesh Kini
1mo
Report this post
Standard machine learning models are great at predicting what will happen. But in the real world, the most valuable question is often when? ⏱️ Whether you are predicting customer churn, machine failure, or user conversions, treating these as standard classification or regression problems ignores a critical factor: censored data. I just published a new guide: Survival Analysis for Data Scientists: A Practical Guide to Time-to-Event Modeling in Python. If you want to move beyond simple point predictions and start building probability curves over time, this guide is for you. Here is a look at what’s inside: 🔹 The core math behind the survival & hazard functions (kept simple!) 🔹 Why handling "right-censoring" makes or breaks your model 🔹 Building your first Kaplan-Meier estimator 🔹 Implementing the Cox Proportional Hazards model using Python Check out the full article here in the comments! 👇 What is your go-to method for modeling time-to-event data? Let me know below! #DataScience #MachineLearning #Python #SurvivalAnalysis #PredictiveAnalytics #CustomerChurn #DataScientists #TechCareers #AIEngineer
1 Comment
Like Comment
To view or add a comment, sign in

505 followers

17 Posts

View Profile Follow

More Relevant Posts

Explore content categories