Name: Uploaded DSS Practical Repository on GitHub with Ashish Sawant Sir | Manthan Mhala posted on the topic | LinkedIn
Uploaded: 2025-11-04T05:20:05.571Z
Duration: 2 min 20 s
Channel: Manthan Mhala

Manthan Mhala

6mo

🚀 Just Uploaded My Data Science and Statistics (DSS) Practical Repository on GitHub! Over the past few weeks, I’ve been diving deep into the fascinating world of Data Science, exploring how raw data can be transformed into powerful insights using Python, Statistics, and Machine Learning. Under the valuable guidance of Ashish Sawant Sir, I worked on a series of hands-on practicals that helped me strengthen my understanding of data handling, analysis, and predictive modeling. 🔍 Topics Covered: 1️⃣ Data Acquisition using Pandas 2️⃣ Measures of Central Tendency (Mean, Median, Mode) 3️⃣ Basics of DataFrame 4️⃣ Handling Missing Values 5️⃣ Creating Arrays using NumPy 6️⃣ Data Visualization using Matplotlib 7️⃣ Simple Linear Regression 8️⃣ Logistic Regression 9️⃣ K-Nearest Neighbors (KNN) 🔟 Support Vector Machine (SVM) 11️⃣ Decision Tree (DT) 12️⃣ Random Forest (RF) 📂 GitHub Repository:https://lnkd.in/duKrWaZC Google Drive: https://lnkd.in/g9xKSPwE Through this practical journey, I learned how to: Clean and preprocess raw datasets using Pandas and NumPy Visualize data trends and patterns using Matplotlib Apply statistical concepts to understand data behavior Build and evaluate predictive models using Scikit-learn Interpret model outputs to make data-driven decisions Each topic contributed significantly to my understanding of the end-to-end data science workflow — from data cleaning and exploration to model building and evaluation. This project has not only strengthened my technical foundation but also sparked a deeper interest in exploring advanced machine learning and AI concepts in the future. A big thanks once again to Ashish Sawant Sir for constant support and guidance throughout this DSS journey. 🙌 #DataScience #MachineLearning #Python #Pandas #NumPy #Matplotlib #Statistics #GitHub #LearningJourney #EngineeringProjects #AI #ML #Coding

To view or add a comment, sign in

More Relevant Posts

Vrushali Barde
6mo
Report this post
🚀 Just Uploaded My Data Science and Statistics (DSS) Practical Repository on GitHub! Over the past few weeks, I’ve been diving deep into the fascinating world of Data Science, exploring how raw data can be transformed into powerful insights using Python, Statistics, and Machine Learning. Under the valuable guidance of Ashish Sawant Sir, I worked on a series of hands-on practicals that helped me strengthen my understanding of data handling, analysis, and predictive modeling. 🔍 Topics Covered: 1️⃣ Data Acquisition using Pandas 2️⃣ Measures of Central Tendency (Mean, Median, Mode) 3️⃣ Basics of DataFrame 4️⃣ Handling Missing Values 5️⃣ Creating Arrays using NumPy 6️⃣ Data Visualization using Matplotlib 7️⃣ Simple Linear Regression 8️⃣ Logistic Regression 9️⃣ K-Nearest Neighbors (KNN) 🔟 Support Vector Machine (SVM) 1️⃣1️⃣ Decision Tree (DT) 1️⃣2️⃣ Random Forest (RF) 📂 GitHub Repository: https://lnkd.in/d87G4muR Through this practical journey, I learned how to: ✅ Clean and preprocess raw datasets using Pandas and NumPy ✅ Visualize data trends and patterns using Matplotlib ✅ Apply statistical concepts to understand data behavior ✅ Build and evaluate predictive models using Scikit-learn ✅ Interpret model outputs to make data-driven decisions Each topic contributed significantly to my understanding of the end-to-end data science workflow — from data cleaning and exploration to model building and evaluation. This project has not only strengthened my technical foundation but also sparked a deeper interest in exploring advanced machine learning and AI concepts in the future. A big thanks once again to Ashish Sawant Sir for constant support and guidance throughout this DSS journey. 🙌 #DataScience #MachineLearning #Python #Pandas #NumPy #Matplotlib #Statistics #GitHub #LearningJourney #EngineeringProjects #AI #ML #Coding
Like Comment
To view or add a comment, sign in
Aagaz sanjari
6mo Edited
Report this post
Data Science and Statistics, Over the past few weeks, I’ve been diving deep into the field of Data Science, exploring how data can be transformed into meaningful insights. Under the guidance of Ashish Sawant, I worked on a series of practicals that strengthened my understanding of both fundamental and advanced concepts in Python, Statistics, and Machine Learning. 💡 Throughout this journey, I learned how to clean, visualize, and analyze data, and implement key ML algorithms to solve real-world problems. 🔍 Topics Covered: 1️⃣ Data Acquisition using Pandas 2️⃣ Measures of Central Tendency (Mean, Median, Mode) 3️⃣ Basics of DataFrame 4️⃣ Handling Missing Values 5️⃣ Creating Arrays using NumPy 6️⃣ Data Visualization 7️⃣ Simple Linear Regression 8️⃣ Logistic Regression 9️⃣ K-Nearest Neighbors (KNN) 🔟 Support Vector Machine (SVM) 11️⃣ Decision Tree (DT) 12️⃣ Random Forest (RF) 📂 Explore my complete practical work here: 🔗 https://lnkd.in/dAmqZY5J Each topic taught me something valuable — from handling datasets efficiently to building predictive models that make data-driven decisions possible. I’m excited to keep learning, improving, and applying these concepts in real-world data science projects! #DataScience #MachineLearning #Python #GitHub #Statistics #AI #Coding #EngineeringJourney #LearningByDoing
Like Comment
To view or add a comment, sign in
Anupam Das
6mo
Report this post
🚫 Stop wasting time scrolling through 50 “Data Science syllabuses” that only confuse you! If you really want to become a Data Scientist, this is the only roadmap you need to follow 👇 🔥 The perfect order to learn: 🔹 Mathematics & Statistics → probability, linear algebra, hypothesis testing, calculus 🔹 Python + Tools → pandas, NumPy, scikit-learn, TensorFlow/PyTorch, visualization libs 🔹 SQL → queries, joins, optimization, stored procedures 🔹 Data Wrangling & Visualization → cleaning, merging, Tableau/PowerBI, storytelling with data 🔹 Machine Learning → supervised & unsupervised learning, model evaluation, feature engineering 🔹 Soft Skills → communication, teamwork, storytelling, and presentation 💡 Pro tip: Build from the center out — start with the foundation, then tools, then ML, then storytelling. Save this roadmap, share it with someone lost in a “100-course” maze, and comment below — 👉 Which layer are you currently working on? #DataScience #MachineLearning #Python #SQL #CareerGrowth #Analytics #LearningPath #DataScientist
10 Comments
Like Comment
To view or add a comment, sign in
Tejendra Singh
6mo
Report this post
🚀 Data Science Roadmap – 2025 🧠📊 A clear path to build strong skills and grow as a Data Scientist: 1️⃣ Mathematics & Statistics → Build a strong foundation with linear algebra, probability, and statistical concepts. 2️⃣ Python Programming → Learn data manipulation (Pandas, NumPy), visualization, and ML libraries (Scikit-Learn, TensorFlow). 3️⃣ SQL → Master querying, joins, data cleaning, and optimization for working with databases. 4️⃣ Data Wrangling → Clean, transform, and prepare data for analysis. 5️⃣ Data Visualization → Use tools like Tableau, Power BI, Seaborn, and Plotly to present insights effectively. 6️⃣ Machine Learning → Learn supervised & unsupervised algorithms, model building, and evaluation techniques. 7️⃣ Soft Skills→ Communication, teamwork, storytelling, and problem-solving are key for real-world impact. 💡 Tip: Start with the basics, practice through projects, and keep learning consistently. #DataScience #MachineLearning #CareerRoadmap #LearningJourney #Python #SQL #AI #LinkedInLearning
2 Comments
Like Comment
To view or add a comment, sign in
Ramya C
5mo
Report this post
🔢 NumPy Practice – Building Strong Data Analytics Foundations 🚀 Today I focused on improving my Python skills by practicing NumPy, one of the core libraries for Data Analytics and Machine Learning. NumPy helps make numerical operations extremely fast, efficient, and clean. 🔍 What I practiced: ✅ Creating arrays using `array()`, `arange()`, `linspace()` ✅ Indexing & slicing (1D, 2D, 3D) ✅ Mathematical & statistical operations ✅ Broadcasting ✅ Reshaping arrays using `reshape()` ✅ Horizontal & vertical stacking ✅ Boolean filtering ✅ Random module (`rand`, `randn`, `randint`) ✅ Vectorization 🔥 Additional Advanced Practice: 📌 Matrix multiplication (`dot`, `matmul`) 📌 Conditional selection using `np.where()` 📌 Sorting arrays using `np.sort()` 📌 Getting unique values with `np.unique()` 📌 Loading files with `np.genfromtxt()` 📌 Checking memory usage of list vs array 📌 Speed testing using `%timeit` 🧠 Why NumPy is a must-learn? * Faster numerical operations * Clean & simplified code * Backbone for Pandas, Scikit-Learn, Matplotlib * Essential for ML, AI, Data Preprocessing 🔗 GitHub Repository Here is the code I practiced today 👇 👉 GitHub:https://lnkd.in/gs-jEcH9
Like Comment
To view or add a comment, sign in
Harini Jeyashree
6mo
Report this post
Day 72 — Data Cleaning Made Effortless: Master the Power of PyJanitor 🧹 Every Data Scientist knows this pain — messy data that eats up 80% of your project time. But what if you could clean it in minutes, not hours? Meet PyJanitor — a hidden Python gem that extends Pandas with super-clean syntax for data cleaning & transformation! 💫 Why PyJanitor? ✅ Built on top of Pandas, so no steep learning curve ✅ Provides chainable cleaning methods ✅ Ideal for data wrangling, column renaming, missing value handling, and more Example: import pandas as pd import janitor df = pd.DataFrame({ "Employee Name": ["Harini", "John", "Maya"], "Salary($)": [50000, None, 60000], "Join Date": ["2021-01-10", "2020-05-19", None] }) clean_df = ( df .clean_names() # standardize column names .remove_empty() # remove empty rows/columns .fill_empty("salary", 0) # fill missing salary .dropna(subset="join_date") # drop rows missing join date ) print(clean_df) With just 4 lines, you’ve done what normally takes 20+ lines in Pandas! 🤯 Real-world use case: In data pipelines, PyJanitor helps create clean, reproducible preprocessing scripts — perfect for ML models and analytics workflows. Pro Tip: You can even chain operations like dplyr in R, making your cleaning workflow elegant and efficient. Question for you: What’s your go-to tool for cleaning large, messy datasets — Pandas, Dask, or something else? #Day72 #100DaysOfDataScience #PyJanitor #Python #DataCleaning #DataWrangling #DataPreprocessing #Pandas #MachineLearning #Analytics #DataEngineering #AI #DataScienceCommunity #LearnDataScience
Like Comment
To view or add a comment, sign in
Robin Kamboj
6mo
Report this post
How to Build a Data Science Project — Step by Step A good Data Science project doesn’t just show your skills — it shows your thinking process. Here’s how I approach every project 👇 1️⃣ Define the Problem — Clearly understand what you’re solving. Example: “Predict house prices” or “Classify emails as spam.” 2️⃣ Collect the Data — Use sources like Kaggle, UCI Machine Learning Repository, or APIs. 3️⃣ Clean the Data — Handle missing values, remove duplicates, and fix inconsistencies. 4️⃣ Explore the Data (EDA) — Visualize patterns using Matplotlib or Seaborn. 5️⃣ Feature Engineering — Create new variables that improve model performance. 6️⃣ Model Building — Use algorithms like Linear Regression, Decision Trees, or Random Forest. 7️⃣ Model Evaluation — Check accuracy, precision, recall, or RMSE depending on the task. 8️⃣ Deploy or Share — Upload your project on GitHub or share results on LinkedIn! 💬 Lesson: A project is not just about code — it’s about how you think, analyze, and communicate results. #DataScience #MachineLearning #Python #GitHub #RobinKamboj #ProjectBuilding #DataAnalytics
1 Comment
Like Comment
To view or add a comment, sign in
Ibrahim Elawady
6mo
Report this post
🧰 The Essential Tools Every Data Scientist Should Know Behind every successful data project lies the right set of tools, each designed for a specific stage in the data science lifecycle. Here’s a quick overview 👇 🔹 Data Collection & Cleaning: • Python (Pandas, NumPy) • SQL for querying and managing data • Excel & Google Sheets for quick exploration 🔹 Data Visualization & Analysis: • Matplotlib, Seaborn, Plotly for visual insights • Power BI & Tableau for interactive dashboards 🔹 Modeling & Machine Learning: • Scikit-learn for classical ML • TensorFlow & PyTorch for deep learning • Jupyter Notebook & Google Colab for experimentation 🔹 Deployment & Monitoring: • Flask / FastAPI to serve models • Docker for containerization • MLflow for tracking experiments 💡 The key isn’t knowing every tool, it’s mastering the right tool for each task. #DataScience #MachineLearning #Python #DataAnalytics #PowerBI #AI #GoogleColab #DataTools
Like Comment
To view or add a comment, sign in
Joachim Schork
6mo
Report this post
Model-based clustering uses statistical models, most commonly Gaussian mixtures, to identify patterns and group data points based on probability rather than just distance. Each cluster is represented by a probability distribution, and the algorithm estimates both the parameters of these distributions and the likelihood that each point belongs to each cluster. Unlike simpler methods such as k-means, it provides a probabilistic view of group membership and can model clusters of different shapes, sizes, and orientations. ✔️ Can reveal hidden patterns in complex data ✔️ Works well even when clusters overlap ✔️ Uses statistical criteria like BIC to choose the number of clusters objectively ✔️ Accounts for varying cluster shapes and orientations ❌ Can lead to overfitting if too many clusters are chosen without proper model selection ❌ May produce misleading results if model assumptions, such as Gaussian-shaped clusters, do not hold ❌ Computationally more demanding than simpler methods ❌ Not ideal for highly irregular clusters, where approaches like DBSCAN or spectral clustering might work better In the example below, the first plot shows all data points in black. The next view assigns each point to a cluster and colors it accordingly. Ellipses represent the estimated Gaussian components of the model, illustrating the probability-based grouping and the variability within each cluster. 🔹 In R, the mclust package fits Gaussian mixture models, selects the optimal number of clusters using BIC, and offers tools for uncertainty visualization, which is important for interpreting ambiguous points. 🔹 In Python, the sklearn.mixture.GaussianMixture class supports Gaussian mixture modeling, with BIC or AIC for model selection, and results can be visualized using matplotlib or seaborn. For more insights into statistics, data science, R, and Python, join my email newsletter for practical tips delivered directly to your inbox. Learn more: http://eepurl.com/gH6myT #rprogramminglanguage #datavisualization #python3
3 Comments
Like Comment
To view or add a comment, sign in

60 followers

7 Posts

View Profile Follow

More Relevant Posts

Explore related topics

Explore content categories