"Learn NumPy and Pandas for Data Analysis"

6mo

Week 4 : Day 01 — NumPy Basics 🧠 What is NumPy? NumPy (Numerical Python) is a Python library used for numerical and scientific computing. It provides a fast array object (ndarray) that allows vectorized operations (no need for loops). 📦 Installation (if needed) pip install numpy 🔹 Creating Arrays import numpy as np # 1D Array arr = np.array([1, 2, 3, 4, 5, 6]) print(arr) 🔹 Indexing and Slicing print(arr[0]) # First element print(arr[-1]) # Last element print(arr[0:3]) # Slice 🔹 Shape and Reshape print(arr.shape) # (6,) print(arr.reshape((2, 3))) # Reshape into 2x3 matrix 🔹 Broadcasting Performing operations on arrays of different shapes automatically: print(arr + 1) # Adds 1 to every element 🔹 Matrix Operations m1 = np.array([[1, 2], [3, 4]]) m2 = np.array([[5, 6], [7, 8]]) print(np.dot(m1, m2)) # Matrix multiplication 🔹 Statistics print("Mean:", np.mean(arr)) print("Standard Deviation:", np.std(arr)) 🔹 Random Numbers print(np.random.rand(5)) # 5 random numbers between 0 and 1 🔹 Handling Missing Values arr = np.array([1, 2, np.nan, 4]) print(np.isnan(arr)) List vs NumPy Performance 🔹 Why NumPy is Faster NumPy uses vectorized operations written in C, making it much faster than Python loops. import time import numpy as np # Python List ls1 = list(range(100000000)) start = time.time() sum(ls1) print("List time:", time.time() - start) # NumPy Array arr = np.arange(100000000) start = time.time() np.sum(arr) print("NumPy time:", time.time() - start) 🧩 NumPy is usually 10x to 50x faster than lists for numeric operations. Day 02 — Pandas Basics 🧠 What is Pandas? Pandas is a Python library for data analysis and manipulation, built on top of NumPy. It provides two main structures: Series → 1D labeled array DataFrame → 2D table (rows + columns) 📦 Installation pip install pandas 🔹 Creating a DataFrame import pandas as pd data = { 'people': ['p1', 'p2', 'p3'], 'age': [20, 30, 40], 'gender': ['M', 'F', 'M'], 'salary': [1000, 2000, 1500] } df = pd.DataFrame(data) print(df) 🔹 Reading and Writing Files # Read CSV / Excel titan_df = pd.read_csv("/Workspace/Users/.../Titanic-Dataset.csv") titan_df = pd.read_excel("/Workspace/Users/.../Titanic-Dataset.xlsx") # Write Files df.to_csv("sample.csv", index=False) df.to_excel("sample.xlsx", index=False) 🔹 Accessing Columns and Rows print(df["people"]) # Single column print(df["age"].sum()) # Summing a column print(df[df["age"] > 30]["people"]) # Filter + select #Python #DataAnalysis #DataEngineer #AzureDataEngineer #DataAnalytics #DataScience

To view or add a comment, sign in

More Relevant Posts

Karishma Bhardwaj
6mo
Report this post
✅ Python for Data Science – Part 1: NumPy Interview Q&A 📊 🔹 1. What is NumPy and why is it important? NumPy (Numerical Python) is a powerful Python library for numerical computing. It supports fast array operations, broadcasting, linear algebra, and random number generation. It’s the backbone of many data science libraries like Pandas and Scikit-learn. 🔹 2. Difference between Python list and NumPy array Python lists can store mixed data types and are slower for numerical operations. NumPy arrays are faster, use less memory, and support vectorized operations, making them ideal for numerical tasks. 🔹 3. How to create a NumPy array import numpy as np arr = np.array([1, 2, 3]) 🔹 4. What is broadcasting in NumPy? Broadcasting lets you perform operations on arrays of different shapes. For example, adding a scalar to an array applies the operation to each element. 🔹 5. How to generate random numbers Use np.random.rand() for uniform distribution, np.random.randn() for normal distribution, and np.random.randint() for random integers. 🔹 6. How to reshape an array Use .reshape() to change the shape of an array without changing its data. Example: arr.reshape(2, 3) turns a 1D array of 6 elements into a 2x3 matrix. 🔹 7. Basic statistical operations Use functions like mean(), std(), var(), sum(), min(), and max() to get quick stats from your data. 🔹 8. Difference between zeros(), ones(), and empty() np.zeros() creates an array filled with 0s, np.ones() with 1s, and np.empty() creates an array without initializing values (faster but unpredictable). 🔹 9. Handling missing values Use np.nan to represent missing values and np.isnan() to detect them. Example: arr = np.array([1, 2, np.nan]) np.isnan(arr) # Output: [False False True] 🔹 10. Element-wise operations NumPy supports element-wise addition, subtraction, multiplication, and division. Example: a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) a + b # Output: [5 7 9] 💡 Pro Tip: NumPy is all about speed and efficiency. Mastering it gives you a huge edge in data manipulation and model building. #follow Karishma Bhardwaj for more.... #python #programming #interviewquestions #questionsanswers #numpy #softwareengineers #learners #programmers #ai #ml
5 Comments
Like Comment
To view or add a comment, sign in
Swapnil Joijode
5mo
Report this post
# UNLOCKING THE POWER OF PYTHON IN DATA ANALYSIS WITH NUMPY Python in Data Analysis hinges on fast, reliable numerical operations, clean data representations, and repeatable workflows. NumPy is the backbone of numeric computing in Python, providing the array data structure and a rich set of operations that let you express complex ideas with simple, vectorized code. This post highlights how NumPy is used in real-world data analysis, essential modules to know, and pragmatic practices to accelerate your analyses. This is part of a continuing series scheduled for Monday, Wednesday, and Friday. OVERVIEW NumPy arrays store homogeneous data more efficiently than Python lists. Vectorized operations translate high-level Python code into fast, low-level computations, often approaching C performance. This matters when you work with large datasets, statistics, or simulations. Key ideas include broadcasting, memory layout, and avoiding Python-level loops by using vectorized operations. NUMPY MODULES AND CAPABILITIES Core functionality lives in numpy and its submodules. Highlights: - numpy.linalg for linear algebra (eigenvalues, SVD, solving systems) - numpy.random for distributions, seeds, and sampling - numpy.fft for fast Fourier transforms - numpy.polynomial for polynomial tools - numpy.ma for masked arrays to handle missing data Practical data workflows often involve converting data from pandas or Python lists into NumPy arrays, performing computations, then converting results back. PRACTICAL TIPS FOR DATA ANALYSIS - Pre-allocate when possible: numpy.empty or numpy.zeros; fill in place. - Use vectorized operations instead of Python loops: a * b, a + b, a @ b. - Be mindful of copying: numpy.asarray to avoid unnecessary copies. - Leverage broadcasting to shape data for right-axis operations. - Choose the right function: mean, median, std, var, min, max; pair NumPy with SciPy for robust stats. - In-place updates can save memory: a += b. - Keep numerics stable: handle near-zero divisions with masking or nan-safe operations. REAL-WORLD USE Imagine a sensor dataset. Normalize values, compute rolling means, and project with numpy.linalg.svd. You can generate synthetic data with numpy.random to test pipelines or vectorize feature engineering across thousands of records. CALL TO ACTION If you found these tips helpful, comment, connect with me, and explore the world of Python and its offerings together. This series runs on Monday, Wednesday, and Friday to help you level up your data analysis with practical NumPy-focused insights.
Like Comment
To view or add a comment, sign in
Sidra NL
5mo Edited
Report this post
Python Data Visualization Using MatplotLib & Seaborn With Numpy 📊🧮 While working with random numbers in NumPy today, i bumped into subtle Data Visualization with MatplotLib and Seaborn! 📊MatplotLib: It helps seaborn to make displots 📊Seaborn: It uses help of matplotlib to create histograms for data visualization ‼️Let's just say we can visualize data and data behavior with MatplotLib and Seaborn that has been obtained from NumPy ------------------------- ☺️ Here are Python (Beginner to Intermediate) GitHub Repos for you: 📁Python Variables: https://lnkd.in/e9rjz-_D 📁Python Operators: https://lnkd.in/e6hzgHSn 📁Python Conditionals: https://lnkd.in/egQNGZBF 📁Python Loops: https://lnkd.in/eezUg_-y 📁Python Functions: https://lnkd.in/eKdU6nex 📁Python Lists & Tuples: https://lnkd.in/eZ8KiQNs 📁Python Dictionaries & Sets: https://lnkd.in/eDmgj7pc 📁Python OOP: https://lnkd.in/eJFupCiK 📁Python DSAs: https://lnkd.in/ebR3rjkt ------------------------- 🤓 NumPy (Beginner To Intermediate): 🧮Arrays: https://lnkd.in/ebghYRYE ------------------------- ⚡ Follow my learning journey: 📎 GitHub: https://lnkd.in/ehu8wX85 🔗 GitLab: https://lnkd.in/eiiQP2gw 💬 Feedback: I’d love your thoughts and tips! 🤝 Collab: If you’re also exploring Python, DM me! Let’s grow together! -------------------------- 📞Book A Call With Me: https://lnkd.in/e23BtnR9 -------------------------- #matplotlib #seaborn #numpy #randomnumbers #pythonforbeginners #pythonfordatascience

1 Comment
Like Comment
To view or add a comment, sign in
Naveen Yadav
6mo
Report this post
#Day53 of #100DaysOfPython : Simple Statistics in Python - Building Strong Data Foundations One of the most underrated skills in data analytics is understanding statistics through Python. Before diving into machine learning or predictive modeling, it’s crucial to truly understand how data behaves - and Python makes that incredibly accessible. Let’s explore simple yet powerful statistical operations you can perform in just a few lines 👇 import numpy as np import statistics as stats data = [12, 18, 25, 30, 22, 15, 20] # Using built-in statistics module print(f"Mean: {stats.mean(data)}") print(f"Median: {stats.median(data)}") print(f"Mode: {stats.mode(data)}") # Using NumPy for numerical efficiency print(f"Variance: {np.var(data):.2f}") print(f"Standard Deviation: {np.std(data):.2f}") What’s Happening Here: ➡️ Mean: The average value - helpful for getting a sense of central tendency. ➡️ Median: The middle value - robust against outliers. ➡️ Mode: The most frequent value - often used in categorical analysis. ➡️ Variance & Standard Deviation: Show how much the data deviates from the mean - essential for understanding data spread and consistency. Real-Life Applications: 🛒 E-commerce: Average order value and variation in customer spend. 🏦 Finance: Volatility of returns using standard deviation. 🧪 Research: Summarizing experimental outcomes. 📈 Business Intelligence: Identifying stable vs. fluctuating KPIs. 💡 Tip: Built-in packages like statistics are great for learning and small datasets, but NumPy and Pandas scale better for real-world scenarios - especially when processing millions of rows. If you’re aiming to grow as a Data Analyst or Data Engineer, this is one of the first fundamental blocks you should master. The ability to calculate and interpret these metrics distinguishes a code writer from a data storyteller. #Python #100DaysOfPython #100DaysOfCode #PythonProgramming #PythonTips #DataScience #MachineLearning #ArtificialIntelligence #DataEngineering #Analytics #PythonForData #AI #CommunityLearning #Coding #LearnPython #Programming #SoftwareEngineering #CodingJourney #Developers #CodingCommunity
Like Comment
To view or add a comment, sign in
Md Arif Raza
5mo
Report this post
📘 Python – Pandas Deep Dive Day 1: Series, Indexing, and Data Exploration 🔍 After completing my NumPy journey ✅, I’ve started my deep dive into Pandas, one of the most powerful Python libraries for data manipulation and analysis. Today’s focus was on the Pandas Series, which forms the core of handling 1-dimensional labeled data. 🧩 1. What is Pandas? An open-source Python library built on NumPy, designed for fast, flexible, and expressive data analysis. It’s the backbone of most data science workflows. 🧩 2. Pandas Series A one-dimensional labeled array capable of holding any data type — numbers, strings, booleans, etc. Acts like an enhanced NumPy array with labels. 🧩 3. Series Attributes Understand essential properties like .index, .values, .dtype, and .shape to inspect data quickly. 🧩 4. Series Using read_csv() Create a Series directly from CSV files for real-world datasets — perfect for quick data exploration. 🧩 5. Series Methods & Math Operations Built-in methods simplify common tasks such as .sum(), .mean(), .sort_values(), and arithmetic operations. 🧩 6. Series Indexing, Slicing & Editing Access, modify, and slice data efficiently using index labels or positions. Enables clean, Pythonic data manipulation. 🧩 7. Boolean Indexing & Python Functionalities Filter data conditionally and integrate Python functions for advanced transformations. 🧩 8. Plotting Graphs on Series Visualize patterns directly with .plot() — quick insights without switching to other visualization tools. ✅ Key Learnings ✔ Pandas simplifies complex data manipulation tasks ✔ Series are powerful for 1D data representation and quick analytics ✔ Integration with NumPy, Matplotlib, and Python functions makes it versatile ✔ Ideal for data cleaning, analysis, and visualization 📌 GitHub Repository: 👉 https://lnkd.in/dtMFnetp #Python #Pandas #DataScience #MachineLearning #DataAnalysis #AI #CodingJourney #MdArifRaza #Analytics #100DaysOfCode #CampusX #NumPyToPandas #PythonForDataScience
Like Comment
To view or add a comment, sign in
OmKumar N.
5mo Edited
Report this post
Pandas, Seaborn, and NumPy are essential Python libraries used for data manipulation and analysis. NumPy is primarily for numerical operations, Pandas is used for handling structured data, and Seaborn is designed for creating attractive statistical visualizations. (almabetter.com GeeksforGeeks) Overview of Key Python Libraries::: NumPy:: Purpose: NumPy is essential for numerical computing in Python. It provides support for large, multi-dimensional arrays and matrices. Key Features: High-performance array objects (ndarray). Mathematical functions for array operations. Supports broadcasting for operations on arrays of different shapes. Applications: Used for numerical computations, data preprocessing, and linear algebra tasks. Pandas:: Purpose: Pandas is designed for data manipulation and analysis, built on top of NumPy. Key Features: DataFrame and Series data structures for handling structured data. Tools for data cleaning, merging, reshaping, and time series analysis. Applications: Ideal for data wrangling, cleaning, and preparation tasks. Seaborn:: Purpose: Seaborn is a statistical data visualization library based on Matplotlib. Key Features: High-level interface for creating attractive statistical graphics. Built-in themes for improved aesthetics. Supports various statistical plots like box plots and heatmaps. Applications: Used for visualizing complex datasets and enhancing data presentation. Summary of Functions::: Library / Main Functionality / Key Data Structures / Visualization Support:: NumPy / Numerical operations and array handling / ndarray / Limited (basic plots) Pandas / Data manipulation and analysis / DataFrame Series / Limited (basic plots) Seaborn / Statistical data visualization / N/A / Extensive (advanced plots) These libraries are fundamental for anyone working with data in Python, enabling efficient data analysis and visualization. fygb
Like Comment
To view or add a comment, sign in
Lakshmi Prasad B.
5mo
Report this post
🚀 Day 7 of #100DaysOfDataEngineering Topic: Python Advance - NumPy Basics Tags: #Python #NumPy #DataEngineering #DataScience Today marks the start of our journey into Python for numerical computing. Meet NumPy (Numerical Python), the core library that powers data transformations, mathematical operations, and many popular tools like Pandas and Scikit-learn. 💡 What is NumPy? NumPy provides multi-dimensional arrays (ndarray) and efficient functions to work with them. It is built for speed, allowing you to process large datasets much faster than standard Python lists. Install NumPy with: pip install numpy Import the library: import numpy as np 🧱 Creating Arrays You can create NumPy arrays directly from Python lists or nested lists: import numpy as np # 1D array arr1 = np.array([10, 20, 30]) # 2D array arr2 = np.array([[5, 10, 15], [20, 25, 30]]) print(arr1.shape) # (3,) print(arr2.shape) # (2, 3) print(arr1.dtype) # int64 ⚙️ Common Attributes AttributeDescriptionndarray.shapeDimensions of the array (rows, columns)ndarray.dtypeType of data stored (int, float, etc.)ndarray.ndimNumber of dimensionsndarray.reshape()Change array shape without changing data 📊 Built-in Methods NumPy includes several helpful functions for creating arrays quickly: # Evenly spaced values (like range) np.arange(0, 10, 2) # [0 2 4 6 8] # Arrays of zeros and ones np.zeros((2, 3)) # 2x3 array of zeros np.ones((3, 2)) # 3x2 array of ones # Equally spaced numbers np.linspace(0, 10, 5) # [0. 2.5 5. 7.5 10.] 🎲 Generating Random Numbers NumPy makes it simple to generate test data for experiments and simulations: # Random values (uniform distribution) np.random.rand(3, 4) # Random values (normal distribution) np.random.randn(2, 3) # Random integers between 4 and 40 np.random.randint(4, 40, 10) # 4x4 matrix of random integers up to 50 np.random.randint(50, size=(4,4)) ✅ Key Takeaway NumPy is all about speed and simplicity. It lets you handle large datasets and perform calculations efficiently. These array operations form the foundation of every scalable data pipeline.
Like Comment
To view or add a comment, sign in
Rajesh Singha
6mo
Report this post
🚀 Top 5 Python Libraries Every Data Analyst Should Know (and Why) Python is one of the most powerful tools for data analysis — but the real magic lies in its libraries. Here are my top 5 picks that every aspiring data analyst should master 👇 1️⃣ Pandas 🐼 The backbone of data analysis. Use it to clean, transform, and manipulate data easily with DataFrames. 💡 Example: df.groupby('Category').sum() can summarize entire datasets in one line. 2️⃣ NumPy 🔢 The foundation of numerical computing. Great for mathematical operations, arrays, and handling large datasets efficiently. 💡 Example: numpy.mean(data) to calculate averages lightning fast. 3️⃣ Matplotlib 📈 Perfect for creating static, high-quality charts. Bar graphs, scatter plots, histograms — it’s your first step into data visualization. 💡 Example: plt.plot(x, y) can help visualize trends instantly. 4️⃣ Seaborn 🎨 Built on top of Matplotlib, but more beautiful and easier to use. Ideal for statistical plots — correlation heatmaps, distribution charts, etc. 💡 Example: sns.heatmap(df.corr(), annot=True) reveals relationships in data visually. 5️⃣ Scikit-learn 🤖 When you’re ready to step into machine learning, this is your go-to library. Includes everything from regression to clustering — simple yet powerful. 💡 Example: Build models with just a few lines: from sklearn.linear_model import LinearRegression 💭 Pro Tip: Don’t rush to learn all at once. Start with Pandas and Matplotlib, then gradually move to others as your projects demand. 📌 Question for you: Which Python library do you use the most in your data projects? 👇 #Python #DataAnalytics #DataScience #MachineLearning #Pandas #NumPy #Seaborn #Matplotlib #ScikitLearn #DataVisualization
Like Comment
To view or add a comment, sign in
Sidra NL
5mo
Report this post
🧮 Dissecting NumPy: Working With Intrinsic NumPy Objects For Array Creation 💪 It feels really exciting getting into the core of NumPy and seeing it unlocking its true strength infront of me! 🤔 Why NumPy Arrays Are Better Than Python Lists? - Fast Computation Of Large Datasets - Dont Require Loops - Easy Arithmetical Operations - Consume Less Memory ⚙️Today i digged a bit deeper into NumPy Array Creation With Intrinsic Objects like: - np.ones/np.zeros: gives arrays of 1s and 0s - np.arange(): gives a sequence of array unlike python range() that gives integers - np.linspace(): gives equally linear numbers in array between a start and stop value - np.reshape(): it can simply reshape a given array without changing its data, means generates a new array with a different number of rows and columns (as specified) But, listen up! ‼️One critical thing about NumPy Arrays is their Axis0(rows) and Axis1(coulums). ‼️It means we can perform some of the arithmetical ops on row elements and colum elements using their axis 💭 Its been a productive week by far getting into the world of NumPy and unlocking a new skill on the way of becoming a data scientist! 🫡 Until we meet again, my fellow coders! ------------------------- ☺️ Here are Python (Beginner to Intermediate) GitHub Repos for you: 📁Python Variables: https://lnkd.in/e9rjz-_D 📁Python Operators: https://lnkd.in/e6hzgHSn 📁Python Conditionals: https://lnkd.in/egQNGZBF 📁Python Loops: https://lnkd.in/eezUg_-y 📁Python Functions: https://lnkd.in/eKdU6nex 📁Python Lists & Tuples: https://lnkd.in/eZ8KiQNs 📁Python Dictionaries & Sets: https://lnkd.in/eDmgj7pc 📁Python OOP: https://lnkd.in/eJFupCiK 📁Python DSAs: https://lnkd.in/ebR3rjkt ------------------------- 🤓 NumPy (Beginner To Intermediate): 🧮Arrays: https://lnkd.in/ebghYRYE ------------------------- ⚡ Follow my learning journey: 📎 GitHub: https://lnkd.in/ehu8wX85 🔗 GitLab: https://lnkd.in/eiiQP2gw 💬 Feedback: I’d love your thoughts and tips! 🤝 Collab: If you’re also exploring Python, DM me! Let’s grow together! -------------------------- 📞Book A Call With Me: https://lnkd.in/e23BtnR9 -------------------------- #pythonnumpy #NumPy #pythonlibraries #pythonfordatascience #datascience #machinelearning #artificialintelligence
Like Comment
To view or add a comment, sign in

1,060 followers

16 Posts

View Profile Follow

"Learn NumPy and Pandas for Data Analysis"

More Relevant Posts

Explore related topics

Explore content categories