Pandas DataFrame Basics for Data Analysis

🚀 Day 19 of My AI & Machine Learning Journey Today I learned about one of the most important concepts in data analysis — Pandas DataFrame. 💡 A DataFrame is like a table (rows + columns), and each column is called a Series. 🔹 Creating DataFrame We can create DataFrame in different ways: Using List students_data = [[100,80,10],[90,70,7]] pd.DataFrame(students_data, columns=['iq','marks','package']) Using Dictionary data = {'iq':[100,90],'marks':[80,70],'package':[10,7]} pd.DataFrame(data) Using CSV (Real-world data) pd.read_csv('file.csv') 🔹 DataFrame Attributes • shape → number of rows & columns • dtypes → data types • columns → column names • values → actual data Example: movies.shape 🔹 Important Methods • head() → first rows • tail() → last rows • sample() → random rows • info() → dataset info • describe() → statistics Example: movies.head() movies.describe() 🔹 Handling Data • isnull().sum() → missing values • duplicated().sum() → duplicate rows • rename() → rename columns Example: students.rename(columns={'marks':'percent'}) 🔹 Mathematical Operations • sum() • mean() • median() Example: students.mean() students.sum(axis=1) 🔹 Selecting Data Single column → Series movies['title'] Multiple columns → DataFrame movies[['title','year']] 🔹 Setting Index We can set a column as index: students.set_index('name', inplace=True) 💡 Biggest Takeaway: DataFrame is the backbone of data analysis — every ML project starts with understanding data properly. Learning with practical examples 🚀 #MachineLearning #Python #Pandas #DataFrame #DataScience #LearningJourney #TechGrowth

1 Comment

Prih Roshni 2w

Great keep going

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Kunal kumar
2w
Report this post
🚀 Day 17 of My AI & Machine Learning Journey Today I explored Pandas Series in depth — including its attributes, methods, and working with CSV data. 🔹 Series Attributes These help us understand the structure of data: • size → Total number of elements (including missing values) • dtype → Data type of elements • name → Name of the series • is_unique → Checks if values are unique • index → Shows index labels • values → Returns actual data 🔹 Creating Series from CSV By default, read_csv() loads data as DataFrame. To convert it into Series, we use: 👉 .squeeze() Example: Single column → Converted into Series Multiple columns → Use index_col to select index 🔹 Important Series Methods • head() → Shows first 5 rows • tail() → Shows last 5 rows • sample() → Picks random row (avoids bias) • value_counts() → Frequency of values • sort_values() → Sort data (asc/desc) • sort_index() → Sort by index 👉 Method Chaining: Combining multiple methods together Example: sort → head → value 🔹 Mathematical Operations • count() → Counts values (ignores missing) • sum() → Total • mean() → Average • median() → Middle value • mode() → Most frequent value • std() → Standard deviation • var() → Variance • min() / max() → Smallest / Largest value 🔹 describe() Method Gives a quick summary of dataset: • Count • Mean • Std • Min / Max • Percentiles (25%, 50%, 75%) 💡 Biggest Takeaway: Pandas Series provides powerful tools to analyze, clean, and understand data efficiently. Learning deeper into data handling step by step 🚀 #MachineLearning #Python #Pandas #DataScience #LearningJourney #TechGrowth
Like Comment
To view or add a comment, sign in
Sk Hussain Siddik
2w
Report this post
🧹 Why Data Preprocessing is the Most Important Step in Machine Learning As a beginner in ML, I used to think the model was everything. Then I learned about preprocessing — and it changed how I see the whole pipeline. Here's what I've understood so far: 📊 What is Data Preprocessing? It's the process of cleaning and transforming raw data before feeding it to a model. Real-world data is messy — missing values, duplicates, wrong formats. Preprocessing fixes all of that. ⚠️ Why does it matter so much? A model is only as good as the data it learns from. Even the most powerful algorithm will give poor results if the data is dirty. Garbage in → Garbage out. 🔧 Key preprocessing steps: • Handling missing values — Fill them with the mean/median or remove them entirely • Removing duplicates — Duplicate rows can bias your model's learning • Encoding categorical data — Convert text labels (like "Yes"/"No") into numbers • Feature scaling — Normalize or standardize values so no single feature dominates • Splitting the data — Divide into training and testing sets before building the model 📈 Real impact: Studies show that data scientists spend nearly 80% of their time on data preparation. It's not glamorous — but it's what separates a good model from a great one. I'm still learning, but one thing is clear: preprocessing is not optional — it's essential. 💡 #LearningJourney #Python #MachineLearning #ArtificialIntelligence #DataScience #InnomaticsResearchLabs
Like Comment
To view or add a comment, sign in
Kunal kumar
2w
Report this post
🚀 Day 18 of My AI & Machine Learning Journey Today I explored advanced concepts in Pandas Series like indexing, filtering, editing, and real data operations. 🔹 1. Indexing in Series • Integer Indexing → Access value using index • Slicing → Get multiple values at once • Fancy Indexing → Use list or condition to select data 💡 Example: Selecting specific rows or range of data 🔹 2. Editing Series • Update values using index • Add new values using new index • Modify multiple values using slicing 👉 Series is mutable (we can change data easily) 🔹 3. Python Functionality on Series We can directly use Python functions like: • len() • max() / min() • sorted() Also supports: • Looping • Type conversion (list, dict) • Membership checking 🔹 4. Boolean Indexing (Very Important) Used for filtering data based on conditions Examples: • Scores ≥ 50 • Values == 0 • Data > threshold 👉 Helps in real-world data filtering 🔹 5. Plotting Data • Line Plot → trends • Bar Chart → comparisons • Pie Chart → percentage distribution 👉 Helps in visual understanding of data 🔹 6. Important Series Methods • astype() → change data type • between() → filter range • clip() → limit values • drop_duplicates() → remove duplicates • isnull() / dropna() / fillna() → handle missing values • isin() → check values • apply() → apply custom function • copy() → create safe copy 💡 Biggest Takeaway: Pandas Series is not just for storing data — it allows powerful data manipulation, filtering, and analysis. Learning more practical concepts every day 🚀 #MachineLearning #Python #Pandas #DataScience #LearningJourney #TechGrowth
Like Comment
To view or add a comment, sign in
Kunal kumar
2w
Report this post
🚀 Day 21 of My AI & Machine Learning Journey Today I learned important Pandas DataFrame functions that are widely used in real-world data analysis. 🔹 1. astype() → Change data type ipl['ID'] = ipl['ID'].astype('int32') 🔹 2. value_counts() → Count frequency ipl['Player_of_Match'].value_counts() 🔹 3. sort_values() → Sort data movies.sort_values('title_x') 🔹 4. rank() → Ranking values batsman['rank'] = batsman['runs'].rank(ascending=False) 🔹 5. sort_index() → Sort by index movies.sort_index() 🔹 6. set_index() → Set column as index df.set_index('name', inplace=True) 🔹 7. reset_index() → Reset index df.reset_index() 🔹 8. unique() → Get unique values ipl['Season'].unique() 🔹 9. nunique() → Count unique values ipl['Season'].nunique() 🔹 10. isnull() / notnull() → Check missing values students.isnull() students.notnull() 🔹 11. dropna() → Remove missing values students.dropna() 🔹 12. fillna() → Fill missing values students.fillna(0) 🔹 13. drop_duplicates() → Remove duplicates df.drop_duplicates() 🔹 14. drop() → Delete rows/columns df.drop(columns=['col1']) 🔹 15. apply() → Apply custom function df['new'] = df.apply(func, axis=1) 💡 Biggest Takeaway: These functions are essential for data cleaning, transformation, and preparation before building ML models. Learning practical data handling step by step 🚀 #MachineLearning #Python #Pandas #DataScience #DataCleaning #LearningJourney
Like Comment
To view or add a comment, sign in
Srujan Lakku
3w
Report this post
Day-3 of Consistency#Srujan Lakku Today’s Learning Snapshot – ML, Data & Practical Implementation Today was focused on strengthening core machine learning concepts along with hands-on data handling: Programming & Data Handling Basics Worked on concatenation, importing data (including Google Sheets integration), and understanding structured workflows for handling datasets. Reinforced Python fundamentals like importing libraries and efficient coding practices. Databases + AI Integration Explored MySQL with AI, understanding how databases can be integrated into intelligent systems for data-driven decision making. Core ML Pipeline Components Deep dive into key components of Machine Learning: Feature Engineering – transforming raw data into meaningful inputs Model Evaluation & Validation – ensuring model reliability Train-Test Split & Random Seed – maintaining reproducibility and accuracy Supervised Learning Algorithms Implemented and understood: Logistic Regression – for classification problems Decision Trees – for interpretable decision-making models Libraries & Tools Hands-on practice with: NumPy – numerical computations Matplotlib – data visualization Scikit-learn (sklearn) – ML model building and evaluation Statistics & Data Understanding Strengthened foundational concepts: Types of Data – Quantitative vs Qualitative Categorical Data & Likert Scale Time-based Data Analysis Building strong fundamentals in data, statistics, and ML pipelines is key to becoming a reliable AI engineer. #MachineLearning #DataScience #Python #NumPy #Matplotlib #Sklearn #AI #Statistics #FeatureEngineering #LearningJourney Centle India Skill Tank GUGULOTH SAI KRISHNA
Like Comment
To view or add a comment, sign in
Abdullah Bakr
2w
Report this post
Most people think Data Science is just Python + Machine Learning. Then they see this diagram. 👇 ━━━━━━━━━━━━━━━━━━━━ Data Science is 9 layers — not one skill: 🔵 Data Foundations → understand your data before you touch it 🔵 Data Pipelines → clean it, transform it, make it usable 🔵 Statistical & ML Methods → the engine everyone focuses on 🔵 Applied Data Science → turn methods into real solutions 🔵 Business & Decision Layer → make your work actually matter 🔵 Insights & Models → build things people can act on 🔵 Model Evaluation → make it reliable, not just accurate 🔵 Deployment & Monitoring → a model in a notebook isn't a product 🔵 Governance & Ethics → the layer everyone ignores until something breaks ━━━━━━━━━━━━━━━━━━━━ Most data scientists are great at 2 or 3 of these. The ones who understand all 9 — even at a surface level — are the ones who lead teams, drive real decisions, and build things that survive production. Which layer do you feel weakest in right now? Drop it below 👇 ♻️ Repost — someone needs to see how big this field actually is. #DataScience #MachineLearning #AI #DataEngineering #MLOps #Python #Statistics #DataAnalytics #DeepLearning #CareerInData
Like Comment
To view or add a comment, sign in
Muhammad Mujtaba Raza
3w
Report this post
Most beginners spend months learning algorithms. But they skip the techniques that actually make models work. Here are 6 ML techniques every beginner data scientist should master before anything else: 𝟬𝟭 · 𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 Your model is only as good as your inputs. Domain knowledge beats fancy architectures every time. 𝟬𝟮 · 𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 When salary is 50,000 and age is 25, your model listens to salary. MinMax and Z-score fix that. 𝟬𝟯 · 𝗗𝗮𝘁𝗮 𝗕𝗮𝗹𝗮𝗻𝗰𝗶𝗻𝗴 Training on 90% majority / 10% minority data doesn't build a model — it builds a bias machine. Use SMOTE. 𝟬𝟰 · 𝗖𝗿𝗼𝘀𝘀-𝗩𝗮𝗹𝗶𝗱𝗮𝘁𝗶𝗼𝗻 One train/test split is lucky, not reliable. K-Fold gives you a score you can actually trust. 𝟬𝟱 · 𝗛𝘆𝗽𝗲𝗿𝗽𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿 𝗧𝘂𝗻𝗶𝗻𝗴 Default settings are a starting point, not an endpoint. Grid Search and Bayesian optimization are your friends. 𝟬𝟲 · 𝗠𝗼𝗱𝗲𝗹 𝗘𝗻𝘀𝗲𝗺𝗯𝗹𝗲 Combine 3 average models and you often beat 1 great one. Bagging, Boosting, Stacking — learn all three. Master these before you obsess over the next algorithm. Save this post and share it with someone just starting out. 🔖 #Datascientist #Data #MachineLearning #DataScience #MLBeginners #AI #Python #DataScientist #ArtificialIntelligence #MLOps #LearnML #TechCareer
Like Comment
To view or add a comment, sign in
Vishnu Kumar
1w
Report this post
Most ML algorithms need labelled data to learn. K-Means doesn't. It finds patterns on its own 🤖 Here's everything you need to know about K-Means Clustering 👇 📌 WHAT IS K-MEANS? An unsupervised algorithm that groups similar data points into K clusters — without any labels. You give it data. It finds the groups itself. 📌 HOW IT WORKS — 4 steps: Step 1 → Randomly place K centroids in the data Step 2 → Assign every point to its nearest centroid Step 3 → Move centroid to the mean of its assigned points Step 4 → Repeat until centroids stop moving That's it. Simple idea. Powerful results. 📌 HOW TO CHOOSE K? — The Elbow Method → Run K-Means for K = 1, 2, 3... 10 → Plot inertia (sum of distances to centroid) → Find where the curve bends like an elbow → That bend = the best K 📌 USE IT FOR: ✅ Customer segmentation ✅ Document grouping ✅ Image compression ✅ Anomaly detection ✅ Recommendation systems 📌 LIMITATIONS: ❌ You must choose K manually ❌ Sensitive to outliers — one bad point shifts the centroid ❌ Assumes clusters are round/spherical ❌ Struggles with very different cluster sizes 📌 Python code: from sklearn.cluster import KMeans model = KMeans(n_clusters=3, random_state=42) model.fit(X) labels = model.labels_ 📌 PRO TIPS: → Always scale your data first — K-Means is distance-based → Run multiple times with different seeds — results can vary → Use Silhouette Score to validate your clusters #MachineLearning #KMeans #Clustering #DataScience #Python #LearningInPublic #OpenToWork #AI
Like Comment
To view or add a comment, sign in
Pratik Tarate
2w
Report this post
Machine Learning Beginner’s Realization + Cheat Sheet While learning Machine Learning, I realized something very important : - If you have strong hands-on practice in Pandas and NumPy, ML becomes much easier. - If these are weak, even simple algorithms feel difficult to implement. As a beginner, one common confusion is: ✓ “How do I import different ML algorithms?” But here’s a hidden truth : ✓ Almost all ML algorithms follow a similar workflow! --- •× Common ML Workflow (Must Remember) ✓ Import ✓ Create Model ✓ Train (fit) ✓ Predict from sklearn.model import AlgorithmName model = AlgorithmName() model.fit(X_train, y_train) y_pred = model.predict(X_test) • Once you understand this pattern, learning ML becomes much simpler. --- Important ML Algorithms (Import Cheat Sheet) # Linear Regression from sklearn.linear_model import LinearRegression # Logistic Regression from sklearn.linear_model import LogisticRegression # Decision Tree from sklearn.tree import DecisionTreeClassifier # Random Forest from sklearn.ensemble import RandomForestClassifier # KNN from sklearn.neighbors import KNeighborsClassifier # SVM from sklearn.svm import SVC # Naive Bayes from sklearn.naive_bayes import GaussianNB # K-Means from sklearn.cluster import KMeans # PCA from sklearn.decomposition import PCA --- Train-Test Split from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) --- 📈 Evaluation from sklearn.metrics import accuracy_score accuracy_score(y_test, y_pred) --- • Pandas Essentials import pandas as pd df = pd.read_csv("file.csv") df.head() df.info() df.describe() df.isnull().sum() df.dropna() df.fillna(0) df.groupby('col')['col'].mean() --- ⚡ NumPy Essentials import numpy as np arr = np.array([1,2,3]) arr.shape arr.reshape(3,1) np.mean(arr) np.sum(arr) --- 💡 Key Learnings ✔ Don’t skip Pandas & NumPy ✔ ML is more about data than algorithms ✔ Learn 4–5 algorithms deeply instead of 50 superficially ✔ Practice > Theory --- 🚨 Beginner Mistakes Training models without data cleaning ❌ Ignoring missing values ❌ Confusing fit() vs predict() ❌ Copy-pasting code without understanding --- 🎯 Final Thought: Machine Learning is not difficult… 👉 With the right direction and consistent practice, it becomes logical and enjoyable. --- If you’re also learning ML, let’s connect 🤝 #MachineLearning #DataScience #Python #Pandas #NumPy #LearningJourney #Freshers #DataAnalytics #CFBR

1 Comment
Like Comment
To view or add a comment, sign in
REDDY MOHAN REDDY
2w
Report this post
Day 133 of my Data Science Journey 🚀 Staying updated in 2026 means keeping pace with tools and skills that are evolving every single week. Here are some key takeaways I came across today: 🔹 Python continues to dominate with advanced ML libraries 🔹 SQL remains essential for handling complex analytical queries 🔹 Cloud-based analytics platforms are becoming the norm 🔹 AI-powered visualization tools are transforming data storytelling 🔹 Low-code & no-code ML solutions are making AI more accessible 💡 One important insight: Companies are no longer looking for just one skill — they’re prioritizing hybrid skill sets. The ability to combine technical expertise with analytical thinking and business understanding is what sets you apart. Consistency + Continuous Learning = Growth 📈 Let’s keep building, learning, and evolving every day. #Day133 #DataScience #MachineLearning #AI #LearningJourney #Upskilling #Analytics #CareerGrowth
Like Comment
To view or add a comment, sign in

539 followers

36 Posts

View Profile Follow

Pandas DataFrame Basics for Data Analysis

More Relevant Posts

Explore content categories