🚀 Day 17 of My AI & Machine Learning Journey Today I explored Pandas Series in depth — including its attributes, methods, and working with CSV data. 🔹 Series Attributes These help us understand the structure of data: • size → Total number of elements (including missing values) • dtype → Data type of elements • name → Name of the series • is_unique → Checks if values are unique • index → Shows index labels • values → Returns actual data 🔹 Creating Series from CSV By default, read_csv() loads data as DataFrame. To convert it into Series, we use: 👉 .squeeze() Example: Single column → Converted into Series Multiple columns → Use index_col to select index 🔹 Important Series Methods • head() → Shows first 5 rows • tail() → Shows last 5 rows • sample() → Picks random row (avoids bias) • value_counts() → Frequency of values • sort_values() → Sort data (asc/desc) • sort_index() → Sort by index 👉 Method Chaining: Combining multiple methods together Example: sort → head → value 🔹 Mathematical Operations • count() → Counts values (ignores missing) • sum() → Total • mean() → Average • median() → Middle value • mode() → Most frequent value • std() → Standard deviation • var() → Variance • min() / max() → Smallest / Largest value 🔹 describe() Method Gives a quick summary of dataset: • Count • Mean • Std • Min / Max • Percentiles (25%, 50%, 75%) 💡 Biggest Takeaway: Pandas Series provides powerful tools to analyze, clean, and understand data efficiently. Learning deeper into data handling step by step 🚀 #MachineLearning #Python #Pandas #DataScience #LearningJourney #TechGrowth
Kunal kumar’s Post
More Relevant Posts
-
🚀 Day 19 of My AI & Machine Learning Journey Today I learned about one of the most important concepts in data analysis — Pandas DataFrame. 💡 A DataFrame is like a table (rows + columns), and each column is called a Series. 🔹 Creating DataFrame We can create DataFrame in different ways: Using List students_data = [[100,80,10],[90,70,7]] pd.DataFrame(students_data, columns=['iq','marks','package']) Using Dictionary data = {'iq':[100,90],'marks':[80,70],'package':[10,7]} pd.DataFrame(data) Using CSV (Real-world data) pd.read_csv('file.csv') 🔹 DataFrame Attributes • shape → number of rows & columns • dtypes → data types • columns → column names • values → actual data Example: movies.shape 🔹 Important Methods • head() → first rows • tail() → last rows • sample() → random rows • info() → dataset info • describe() → statistics Example: movies.head() movies.describe() 🔹 Handling Data • isnull().sum() → missing values • duplicated().sum() → duplicate rows • rename() → rename columns Example: students.rename(columns={'marks':'percent'}) 🔹 Mathematical Operations • sum() • mean() • median() Example: students.mean() students.sum(axis=1) 🔹 Selecting Data Single column → Series movies['title'] Multiple columns → DataFrame movies[['title','year']] 🔹 Setting Index We can set a column as index: students.set_index('name', inplace=True) 💡 Biggest Takeaway: DataFrame is the backbone of data analysis — every ML project starts with understanding data properly. Learning with practical examples 🚀 #MachineLearning #Python #Pandas #DataFrame #DataScience #LearningJourney #TechGrowth
To view or add a comment, sign in
-
-
🚀 Day 21 of My AI & Machine Learning Journey Today I learned important Pandas DataFrame functions that are widely used in real-world data analysis. 🔹 1. astype() → Change data type ipl['ID'] = ipl['ID'].astype('int32') 🔹 2. value_counts() → Count frequency ipl['Player_of_Match'].value_counts() 🔹 3. sort_values() → Sort data movies.sort_values('title_x') 🔹 4. rank() → Ranking values batsman['rank'] = batsman['runs'].rank(ascending=False) 🔹 5. sort_index() → Sort by index movies.sort_index() 🔹 6. set_index() → Set column as index df.set_index('name', inplace=True) 🔹 7. reset_index() → Reset index df.reset_index() 🔹 8. unique() → Get unique values ipl['Season'].unique() 🔹 9. nunique() → Count unique values ipl['Season'].nunique() 🔹 10. isnull() / notnull() → Check missing values students.isnull() students.notnull() 🔹 11. dropna() → Remove missing values students.dropna() 🔹 12. fillna() → Fill missing values students.fillna(0) 🔹 13. drop_duplicates() → Remove duplicates df.drop_duplicates() 🔹 14. drop() → Delete rows/columns df.drop(columns=['col1']) 🔹 15. apply() → Apply custom function df['new'] = df.apply(func, axis=1) 💡 Biggest Takeaway: These functions are essential for data cleaning, transformation, and preparation before building ML models. Learning practical data handling step by step 🚀 #MachineLearning #Python #Pandas #DataScience #DataCleaning #LearningJourney
To view or add a comment, sign in
-
-
🚀 Day 18 of My AI & Machine Learning Journey Today I explored advanced concepts in Pandas Series like indexing, filtering, editing, and real data operations. 🔹 1. Indexing in Series • Integer Indexing → Access value using index • Slicing → Get multiple values at once • Fancy Indexing → Use list or condition to select data 💡 Example: Selecting specific rows or range of data 🔹 2. Editing Series • Update values using index • Add new values using new index • Modify multiple values using slicing 👉 Series is mutable (we can change data easily) 🔹 3. Python Functionality on Series We can directly use Python functions like: • len() • max() / min() • sorted() Also supports: • Looping • Type conversion (list, dict) • Membership checking 🔹 4. Boolean Indexing (Very Important) Used for filtering data based on conditions Examples: • Scores ≥ 50 • Values == 0 • Data > threshold 👉 Helps in real-world data filtering 🔹 5. Plotting Data • Line Plot → trends • Bar Chart → comparisons • Pie Chart → percentage distribution 👉 Helps in visual understanding of data 🔹 6. Important Series Methods • astype() → change data type • between() → filter range • clip() → limit values • drop_duplicates() → remove duplicates • isnull() / dropna() / fillna() → handle missing values • isin() → check values • apply() → apply custom function • copy() → create safe copy 💡 Biggest Takeaway: Pandas Series is not just for storing data — it allows powerful data manipulation, filtering, and analysis. Learning more practical concepts every day 🚀 #MachineLearning #Python #Pandas #DataScience #LearningJourney #TechGrowth
To view or add a comment, sign in
-
-
🚀 Day 20 of My AI & Machine Learning Journey Today I learned how to select, fetch, and filter data from a Pandas DataFrame — one of the most important skills in data analysis. 🔹 1. Selecting Data using iloc & loc • iloc → works with index positions • loc → works with index labels Example: movies.iloc[1] → fetch 2nd row movies.iloc[0:5] → first 5 rows movies.iloc[[0,5,6]] → multiple rows stud.loc['kunal'] → fetch by label stud.loc[['kunal','lakshay']] → multiple rows 🔹 2. Selecting Rows & Columns Together Using iloc: movies.iloc[0:3, 0:3] Using loc: movies.loc[0:2, 'title_x':'poster_path'] 🔹 3. Filtering Data (Very Important 🔥) Using conditions: ipl[ipl['MatchNumber'] == 'Final'] Multiple conditions: ipl[(ipl['City'] == 'Kolkata') & (ipl['WinningTeam'] == 'Chennai Super Kings')] 🔹 4. Real-World Examples • Number of Super Over matches ipl[ipl['SuperOver'] == 'Y'].shape[0] • Toss winner = Match winner % (ipl[ipl['TossWinner'] == ipl['WinningTeam']].shape[0] / ipl.shape[0]) * 100 • Movies with rating > 8 movies[movies['imdb_rating'] > 8] 🔹 5. Adding New Columns movies['Country'] = 'India' Creating from existing column: movies['lead actor'] = movies['actors'].str.split('|').apply(lambda x: x[0]) 💡 Biggest Takeaway: Data analysis is all about selecting the right data and filtering it correctly. Learning real-world data handling step by step 🚀 #MachineLearning #Python #Pandas #DataScience #DataAnalysis #LearningJourney
To view or add a comment, sign in
-
-
🚀 Neha Explains AI — Day 6: NumPy (AI’s Math Power) Python lists ❌ slow NumPy arrays ✅ fast 👉 Why? Because NumPy works on entire data at once (no loops) 🧠 Core Idea import numpy as np arr = np.array([1, 2, 3, 4]) 👉 This is not just a list 👉 It’s an AI-ready data structure ⚡ Magic (Vectorization) arr * 2 👉 Output: [2, 4, 6, 8] No loop. Applied to ALL values instantly. 💡 This is how AI handles millions of data points 💻 Real Example prices = np.array([25000, 35000, 42000]) sizes = np.array([800, 1200, 1500]) prices / sizes 👉 Finds price per sqft for all houses 🎯 What you learned ✅ NumPy = fast data processing ✅ Vectorization = no loops ✅ Mean = average (used in ML) ✅ Arrays = real ML input 🌍 Real AI usage Netflix → recommendations Stock market → analysis ML models → matrix math 🔥 YOUR TASK Try this: np.mean(prices) 👉 What’s the average price? 📌 Tomorrow: Pandas (Excel for AI) 💬 Did NumPy feel easy or confusing? #NehaExplainsAI #NumPy #LearnAI
To view or add a comment, sign in
-
-
Day 22: Feature Extraction & Custom Transformations in Pandas 🐍🤖 In Generative AI, raw text isn't enough. To give an Agent pinpoint accuracy, you need rich, structured metadata. Today, I continued my Pandas deep dive by focusing on advanced data reshaping and programmatic feature extraction. Here are the core engineering takeaways: 🛠️ Feature Extraction: I wrote custom parsing logic to extract specific data points (like dates or counts) from messy string columns and save them as brand-new features. In a RAG pipeline, this extracted data becomes the metadata that allows an Agent to filter a Vector DB accurately before running a semantic search. ⚡ The Power of .apply(): Replaced slow Python loops by using .apply() to execute custom functions and lambda expressions across entire dataset columns instantly. This is the exact method used to programmatically chunk text or generate embeddings for thousands of rows at once. 🔀 Pivot Tables & Cross Tabs: Learned how to dynamically reshape and summarize data matrices using pd.pivot_table() and pd.crosstab(). Structuring data properly ensures that any context passed to an LLM is dense and highly relevant. 📊 Data Profiling: Used .info() and .describe() to instantly understand the statistical distribution and health of a dataset before ever feeding it into a pipeline. Structuring messy, real-world data into clean, machine-readable formats is the true bottleneck in modern AI, and Pandas makes it incredibly efficient. 📈 #Python #GenAI #AgenticAI #MachineLearning #Pandas #DataEngineering #100DaysOfCode
To view or add a comment, sign in
-
🚀 Day 25 of My AI & Machine Learning Journey Today I learned about MultiIndex (Hierarchical Indexing) in Pandas — a powerful way to handle higher dimensional data. 🔹 What is MultiIndex? Normally: • Series → 1D (1 index needed) • DataFrame → 2D (row + column needed) 👉 But with MultiIndex, we can use multiple levels of indexing 🔹 MultiIndex in Series We can create multiple index levels Example: index = pd.MultiIndex.from_product( [['cse','ece'], [2019,2020,2021,2022]] ) s = pd.Series([1,2,3,4,5,6,7,8], index=index) 👉 Access data s[('cse', 2022)] s['ece'] 🔹 stack() & unstack() 👉 Convert between formats • unstack() → MultiIndex → DataFrame • stack() → DataFrame → MultiIndex 🔹 Why MultiIndex? 👉 Used to represent high-dimensional data in lower dimensions Example: 5D → 2D 10D → 2D 🔹 MultiIndex in DataFrame 👉 MultiIndex in Rows df.loc['cse'] 👉 MultiIndex in Columns df['delhi'] df['mumbai']['avg_package'] 🔹 MultiIndex in Both Rows & Columns 👉 Creates higher dimensional structure branch_df3 💡 To access a value → need multiple keys (row + column levels) 💡 Biggest Takeaway: MultiIndex helps manage complex, multi-dimensional data in a structured and readable way. #MachineLearning #Python #Pandas #DataScience #DataAnalysis #LearningJourney #AdvancedPython 🚀
To view or add a comment, sign in
-
-
From raw data to a fully deployed machine learning application The goal was simple but powerful: Predict whether a person’s income is greater than 50K or less/equal to 50K based on real demographic and professional attributes. But the real value was in building the full journey — not just training a model. What I worked on: • Data Cleaning & Preprocessing • Handling categorical variables using Label Encoding • Feature Scaling with StandardScaler • Training and comparing two models: SVM and KNN • Model Evaluation using Accuracy Score • Saving the final model with Pickle • Deploying the full project using Streamlit for real-time predictions Why SVM and KNN? I experimented with both models because each has its own strength. • KNN is simple, intuitive, and works well by classifying data based on similarity between neighbors. It’s great for understanding data patterns quickly. • SVM is powerful for classification problems, especially when the data has clear class separation. It performs well in high-dimensional datasets and usually provides stronger generalization. After comparing both models, I chose SVM as the final deployed model because it achieved better performance, stronger stability, and better overall prediction accuracy for this dataset. This project gave me hands-on experience in transforming data into decisions and turning machine learning into something people can actually use. Building models is important… Deploying them is where the real story begins. Special thanks to my instructor, Youssef Elbadry, and my mentor, Mazen Alattar, for their guidance, support, and valuable feedback throughout this journey. You can also check the full notebook on Kaggle here: https://lnkd.in/dWVJxtQq #MachineLearning #DataScience #ArtificialIntelligence #Python #DeepLearning #DataAnalytics #DataScienceProjects #MachineLearningEngineer #AI #Streamlit #ScikitLearn #SVM #KNN #DataDriven #Analytics #MLProjects
To view or add a comment, sign in
-
🚀 Day 22 of My AI & Machine Learning Journey Today I learned about one of the most powerful concepts in Pandas — GroupBy. 💡 GroupBy is used to group data based on categories and then apply operations like sum, mean, count, etc. 🔹 What is GroupBy? It groups data based on a categorical column Example: movies.groupby('Genre') 👉 Creates groups like Action, Drama, Comedy 🔹 Basic Aggregations movies.groupby('Genre')['Gross'].sum() movies.groupby('Genre')['IMDB_Rating'].mean() movies.groupby('Genre')['No_of_Votes'].sum() 🔹 Real-World Examples • Top 3 genres by total earning movies.groupby('Genre')['Gross'].sum().sort_values(ascending=False).head(3) • Genre with highest average rating movies.groupby('Genre')'IMDB_Rating'].mean().sort_values(ascending=False).head(1) • Director with most popularity movies.groupby('Director')'No_of_Votes'].sum().sort_values(ascending=False).head(1) 🔹 Important GroupBy Methods • size() → number of rows in each group • first() → first item of group • last() → last item • nth(n) → specific row • get_group() → fetch specific group • describe() → statistical summary • sample() → random data from each group • nunique() → unique values count 🔹 Aggregation using agg() (Very Important 🔥) Apply different functions on different columns Example: movies.groupby('Genre').agg({ 'Runtime':'mean', 'IMDB_Rating':'mean', 'No_of_Votes':'sum', 'Gross':'sum' }) 💡 Biggest Takeaway: GroupBy helps in analyzing data category-wise, which is very useful in real-world problems. Learning deeper into data analysis 🚀 #MachineLearning #Python #Pandas #DataScience #GroupBy #DataAnalysis #LearningJourney
To view or add a comment, sign in
-
-
🧹 Why Data Preprocessing is the Most Important Step in Machine Learning As a beginner in ML, I used to think the model was everything. Then I learned about preprocessing — and it changed how I see the whole pipeline. Here's what I've understood so far: 📊 What is Data Preprocessing? It's the process of cleaning and transforming raw data before feeding it to a model. Real-world data is messy — missing values, duplicates, wrong formats. Preprocessing fixes all of that. ⚠️ Why does it matter so much? A model is only as good as the data it learns from. Even the most powerful algorithm will give poor results if the data is dirty. Garbage in → Garbage out. 🔧 Key preprocessing steps: • Handling missing values — Fill them with the mean/median or remove them entirely • Removing duplicates — Duplicate rows can bias your model's learning • Encoding categorical data — Convert text labels (like "Yes"/"No") into numbers • Feature scaling — Normalize or standardize values so no single feature dominates • Splitting the data — Divide into training and testing sets before building the model 📈 Real impact: Studies show that data scientists spend nearly 80% of their time on data preparation. It's not glamorous — but it's what separates a good model from a great one. I'm still learning, but one thing is clear: preprocessing is not optional — it's essential. 💡 #LearningJourney #Python #MachineLearning #ArtificialIntelligence #DataScience #InnomaticsResearchLabs
To view or add a comment, sign in
-
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development