"Learn NumPy and Pandas for Data Analysis"

Week 4 : Day 01 — NumPy Basics 🧠 What is NumPy? NumPy (Numerical Python) is a Python library used for numerical and scientific computing. It provides a fast array object (ndarray) that allows vectorized operations (no need for loops). 📦 Installation (if needed) pip install numpy 🔹 Creating Arrays import numpy as np # 1D Array arr = np.array([1, 2, 3, 4, 5, 6]) print(arr) 🔹 Indexing and Slicing print(arr[0]) # First element print(arr[-1]) # Last element print(arr[0:3]) # Slice 🔹 Shape and Reshape print(arr.shape) # (6,) print(arr.reshape((2, 3))) # Reshape into 2x3 matrix 🔹 Broadcasting Performing operations on arrays of different shapes automatically: print(arr + 1) # Adds 1 to every element 🔹 Matrix Operations m1 = np.array([[1, 2], [3, 4]]) m2 = np.array([[5, 6], [7, 8]]) print(np.dot(m1, m2)) # Matrix multiplication 🔹 Statistics print("Mean:", np.mean(arr)) print("Standard Deviation:", np.std(arr)) 🔹 Random Numbers print(np.random.rand(5)) # 5 random numbers between 0 and 1 🔹 Handling Missing Values arr = np.array([1, 2, np.nan, 4]) print(np.isnan(arr)) List vs NumPy Performance 🔹 Why NumPy is Faster NumPy uses vectorized operations written in C, making it much faster than Python loops. import time import numpy as np # Python List ls1 = list(range(100000000)) start = time.time() sum(ls1) print("List time:", time.time() - start) # NumPy Array arr = np.arange(100000000) start = time.time() np.sum(arr) print("NumPy time:", time.time() - start) 🧩 NumPy is usually 10x to 50x faster than lists for numeric operations. Day 02 — Pandas Basics 🧠 What is Pandas? Pandas is a Python library for data analysis and manipulation, built on top of NumPy. It provides two main structures: Series → 1D labeled array DataFrame → 2D table (rows + columns) 📦 Installation pip install pandas 🔹 Creating a DataFrame import pandas as pd data = { 'people': ['p1', 'p2', 'p3'], 'age': [20, 30, 40], 'gender': ['M', 'F', 'M'], 'salary': [1000, 2000, 1500] } df = pd.DataFrame(data) print(df) 🔹 Reading and Writing Files # Read CSV / Excel titan_df = pd.read_csv("/Workspace/Users/.../Titanic-Dataset.csv") titan_df = pd.read_excel("/Workspace/Users/.../Titanic-Dataset.xlsx") # Write Files df.to_csv("sample.csv", index=False) df.to_excel("sample.xlsx", index=False) 🔹 Accessing Columns and Rows print(df["people"]) # Single column print(df["age"].sum()) # Summing a column print(df[df["age"] > 30]["people"]) # Filter + select #Python #DataAnalysis #DataEngineer #AzureDataEngineer #DataAnalytics #DataScience

To view or add a comment, sign in

Explore content categories