NumPy Basics: Essential Library for Data Science and Machine Learning

View organization page for Data Analytics

554 followers

1mo

*Data Handling Basics Part 1: NumPy (Numerical Computing in Python)* 🔢 NumPy is one of the most important libraries for: - Data science - Machine learning - Scientific computing - Data analytics It provides fast mathematical operations on arrays. *1️⃣ Install NumPy* pip install numpy *2️⃣ Import NumPy* import numpy as np np is the standard alias. *3️⃣ Create NumPy Array* import numpy as np arr = np.array([1, 2, 3, 4]) print(arr) Output: [1 2 3 4] *4️⃣ NumPy vs Python List* Python list: a = [1,2,3] b = [4,5,6] print(a + b) Output: [1,2,3,4,5,6] NumPy array: import numpy as np a = np.array([1,2,3]) b = np.array([4,5,6]) print(a + b) Output: [5 7 9] NumPy performs element-wise operations. *5️⃣ Basic Array Operations* import numpy as np arr = np.array([1,2,3,4]) print(arr + 10) print(arr * 2) Output: [11 12 13 14] [2 4 6 8] *6️⃣ Useful NumPy Functions* import numpy as np arr = np.array([1,2,3,4]) print(np.mean(arr)) print(np.sum(arr)) print(np.max(arr)) print(np.min(arr)) Output example: 2.5 10 4 1 *7️⃣ Create Special Arrays* - Zeros array: `np.zeros(5)` - Ones array: `np.ones(4)` - Range array: `np.arange(1,10)` *8️⃣ 2D Arrays (Matrices)* import numpy as np arr = np.array([ [1,2,3], [4,5,6] ]) print(arr) Access element: `print(arr[0,1])` Output: 2 *Real Example: Student Marks Analysis* import numpy as np marks = np.array([78,85,90,66,72]) print("Average:", np.mean(marks)) print("Highest:", np.max(marks)) print("Lowest:", np.min(marks)) *Practice Tasks* 1. Create NumPy array of numbers 1–10 2. Add 5 to every element 3. Find mean and sum of array 4. Create 3×3 matrix 5. Find maximum value in array *✅ Practice Task Solutions — NumPy Basics* *Task 1. Create NumPy array of numbers 1–10* import numpy as np arr = np.arange(1, 11) print(arr) Output: [1 2 3 4 5 6 7 8 9 10] *Task 2. Add 5 to every element* import numpy as np arr = np.arange(1, 11) result = arr + 5 print(result) Output: [ 6 7 8 9 10 11 12 13 14 15] *Task 3. Find mean and sum of array* import numpy as np arr = np.array([1,2,3,4,5]) print("Sum:", np.sum(arr)) print("Mean:", np.mean(arr)) Output example: Sum: 15 Mean: 3.0 *Task 4. Create 3×3 matrix* import numpy as np matrix = np.array([ [1,2,3], [4,5,6], [7,8,9] ]) print(matrix) Output: [[1 2 3] [4 5 6] [7 8 9]] *Task 5. Find maximum value in array* import numpy as np arr = np.array([12,45,7,89,34]) print("Maximum:", np.max(arr)) Output: Maximum: 89 *✅ Key learning* - np.arange() → create range arrays - NumPy supports vectorized operations - np.mean() → average - np.sum() → total - np.max() → largest value *Double Tap ♥️ For More*

To view or add a comment, sign in

More Relevant Posts

Arati Sabat
1mo
Report this post
*Data Handling Basics Part 1: NumPy (Numerical Computing in Python)* 🔢 NumPy is one of the most important libraries for: - Data science - Machine learning - Scientific computing - Data analytics It provides fast mathematical operations on arrays. *1️⃣ Install NumPy* pip install numpy *2️⃣ Import NumPy* import numpy as np np is the standard alias. *3️⃣ Create NumPy Array* import numpy as np arr = np.array([1, 2, 3, 4]) print(arr) Output: [1 2 3 4] *4️⃣ NumPy vs Python List* Python list: a = [1,2,3] b = [4,5,6] print(a + b) Output: [1,2,3,4,5,6] NumPy array: import numpy as np a = np.array([1,2,3]) b = np.array([4,5,6]) print(a + b) Output: [5 7 9] NumPy performs element-wise operations. *5️⃣ Basic Array Operations* import numpy as np arr = np.array([1,2,3,4]) print(arr + 10) print(arr * 2) Output: [11 12 13 14] [2 4 6 8] *6️⃣ Useful NumPy Functions* import numpy as np arr = np.array([1,2,3,4]) print(np.mean(arr)) print(np.sum(arr)) print(np.max(arr)) print(np.min(arr)) Output example: 2.5 10 4 1 *7️⃣ Create Special Arrays* - Zeros array: `np.zeros(5)` - Ones array: `np.ones(4)` - Range array: `np.arange(1,10)` *8️⃣ 2D Arrays (Matrices)* import numpy as np arr = np.array([ [1,2,3], [4,5,6] ]) print(arr) Access element: `print(arr[0,1])` Output: 2 *Real Example: Student Marks Analysis* import numpy as np marks = np.array([78,85,90,66,72]) print("Average:", np.mean(marks)) print("Highest:", np.max(marks)) print("Lowest:", np.min(marks)) *Practice Tasks* 1. Create NumPy array of numbers 1–10 2. Add 5 to every element 3. Find mean and sum of array 4. Create 3×3 matrix 5. Find maximum value in array *✅ Practice Task Solutions — NumPy Basics* *Task 1. Create NumPy array of numbers 1–10* import numpy as np arr = np.arange(1, 11) print(arr) Output: [1 2 3 4 5 6 7 8 9 10] *Task 2. Add 5 to every element* import numpy as np arr = np.arange(1, 11) result = arr + 5 print(result) Output: [ 6 7 8 9 10 11 12 13 14 15] *Task 3. Find mean and sum of array* import numpy as np arr = np.array([1,2,3,4,5]) print("Sum:", np.sum(arr)) print("Mean:", np.mean(arr)) Output example: Sum: 15 Mean: 3.0 *Task 4. Create 3×3 matrix* import numpy as np matrix = np.array([ [1,2,3], [4,5,6], [7,8,9] ]) print(matrix) Output: [[1 2 3] [4 5 6] [7 8 9]] *Task 5. Find maximum value in array* import numpy as np arr = np.array([12,45,7,89,34]) print("Maximum:", np.max(arr)) Output: Maximum: 89 *✅ Key learning* - np.arange() → create range arrays - NumPy supports vectorized operations - np.mean() → average - np.sum() → total - np.max() → largest value *Double Tap ♥️ For More*
Like Comment
To view or add a comment, sign in
Hana R.
1mo
Report this post
📊 Data Science with Python — A Complete Roadmap for Beginners & Professionals If you're planning to enter Data Science, this roadmap gives you a crystal-clear path to follow using Python. 🐍 Let’s break it down step by step. 👇 🧠 1. Core Python Libraries (Your Foundation) Before anything else, you need to master the essential tools: Pandas → Data manipulation & analysis NumPy → Numerical computing Matplotlib & Seaborn → Data visualization Scikit-learn → Machine learning 👉 These libraries are the backbone of every data science project. 📥 2. Data Loading (Getting Your Data Ready) Data comes from multiple sources, and you should know how to handle all of them: CSV, Excel, JSON files SQL databases Web scraping (BeautifulSoup) NoSQL databases (MongoDB) 👉 Real-world data is messy—learning how to collect it is crucial. 🧹 3. Data Preprocessing (Most Important Step!) This is where raw data becomes useful: Handling missing values Removing duplicates Scaling & normalization Feature selection Encoding categorical variables Outlier detection (Z-score, IQR) Handling imbalanced datasets 👉 80% of a data scientist’s work happens here. 📊 4. Data Analysis (Understanding the Data) Now, you explore and extract insights: Exploratory Data Analysis (EDA) Correlation analysis Hypothesis testing Statistical tests: T-tests, ANOVA Chi-Square, Z-test Mann-Whitney, Wilcoxon Shapiro-Wilk test PCA (Dimensionality Reduction) 👉 This step helps you make data-driven decisions. 📈 5. Data Visualization (Storytelling with Data) Turn numbers into insights: Line charts, bar plots, histograms Heatmaps, box plots, scatter plots Advanced plots: Pair plots, violin plots, KDE plots Interactive dashboards (Bokeh, Folium) 👉 Good visualization = better communication. 🤖 6. Machine Learning (Making Predictions) Finally, you build intelligent systems: Machine learning fundamentals Model training & evaluation Deep learning basics 👉 This is where your data starts creating value. #data #coding #ia #cnn #model #web #python #tools #work #learning
1 Comment
Like Comment
To view or add a comment, sign in
Milan Janosov
1mo
Report this post
An absolute classic collection of Python tools for Data Science by Alex Wang. As for my part, I am using a handful of these on a daily basis even in 2026 - 𝐰𝐡𝐚𝐭 𝐰𝐨𝐮𝐥𝐝 𝐲𝐨𝐮 𝐚𝐝𝐝? 𝐃𝐚𝐭𝐚 𝐌𝐚𝐧𝐢𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧 - Polars: https://pola.rs - Modin: https://lnkd.in/d97bYx79 - Pandas: https://pandas.pydata.org - Vaex: https://vaex.io - Datatable: https://lnkd.in/dApHaBCT - CuPy: https://docs.cupy.dev - NumPy: https://numpy.org 𝐃𝐚𝐭𝐚 𝐕𝐢𝐬𝐮𝐚𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 - Plotly: https://plotly.com/python - Altair: https://lnkd.in/d8pN88j9 - Matplotlib: https://matplotlib.org - Seaborn: https://seaborn.pydata.org - Geoplotlib: https://lnkd.in/d8wtm5CN - Folium: https://lnkd.in/d-yMZhSf - Bokeh: https://docs.bokeh.org - Pygal: http://www.pygal.org 𝐒𝐭𝐚𝐭𝐢𝐬𝐭𝐢𝐜𝐚𝐥 𝐀𝐧𝐚𝐥𝐲𝐬𝐢𝐬 - SciPy: https://scipy.org - PyMC3: https://docs.pymc.io - PyStan: https://lnkd.in/dyDq23S6 - Statsmodels: https://lnkd.in/dTAJ-sv9 - Lifelines: https://lnkd.in/drgZ54cj - Pingouin: https://pingouin-stats.org 𝐌𝐚𝐜𝐡𝐢𝐧𝐞 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 - JAX: https://jax.readthedocs.io - Keras: https://keras.io - Theano: https://lnkd.in/d8Ga6xvV - XGBoost: https://lnkd.in/d7856MDc - Scikit-learn: https://scikit-learn.org - TensorFlow: https://tensorflow.org - PyTorch: https://pytorch.org 𝐍𝐚𝐭𝐮𝐫𝐚𝐥 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐏𝐫𝐨𝐜𝐞𝐬𝐬𝐢𝐧𝐠 - NLTK: https://www.nltk.org - BERT: https://lnkd.in/d3DsJsyD - spaCy: https://spacy.io - TextBlob: https://lnkd.in/dNSdHsjC - Polyglot: https://lnkd.in/dWbhNJrn - Gensim: https://lnkd.in/d4bRCJTC - Pattern: https://lnkd.in/dCbSXzs6 𝐃𝐚𝐭𝐚𝐛𝐚𝐬𝐞 𝐎𝐩𝐞𝐫𝐚𝐭𝐢𝐨𝐧𝐬 - Dask: https://dask.org - PySpark: https://lnkd.in/dqrKCvvu - Ray: https://docs.ray.io - Koalas: https://lnkd.in/dUwXiSWr - Kafka: https://kafka.apache.org - Hadoop: https://hadoop.apache.org 𝐓𝐢𝐦𝐞 𝐒𝐞𝐫𝐢𝐞𝐬 𝐀𝐧𝐚𝐥𝐲𝐬𝐢𝐬 - Sktime: https://lnkd.in/dspnqaEH - Darts: https://lnkd.in/dD6kbSRw - AutoTS: https://lnkd.in/dpquYFRd - Prophet: https://lnkd.in/df8Xt5zG - Kats: https://lnkd.in/dPyQ8vpT - TSFresh: https://lnkd.in/dB8JDJF7 𝐖𝐞𝐛 𝐒𝐜𝐫𝐚𝐩𝐢𝐧𝐠 - Beautiful Soup: https://lnkd.in/drxaifkC - Scrapy: https://scrapy.org - Octoparse: https://www.octoparse.com - Selenium: https://www.selenium.dev
6 Comments
Like Comment
To view or add a comment, sign in
Ravi Prakash Rai
1mo
Report this post
KDnuggets just released the "2026 Data Science Starter Kit" to help you prioritize the skills that actually matter in today’s AI-driven landscape. It’s not about learning everything; it’s about mastering the 20% of tools—like Python, SQL, and EDA—that drive 80% of the results. The guide highlights why you should double down on Python for scalability and why you don't need a PhD in math to build a solid statistical foundation. Check out the full breakdown to see which libraries and workflows are non-negotiable for your roadmap this year. Stop over-engineering your learning path and start building. Read more here: https://lnkd.in/gMYGX6ec #DataScience #MachineLearning #Python #CareerAdvice #DataAnalytics #KDnuggets #LearningRoadmap #AI2026 #TechCareer

The 2026 Data Science Starter Kit: What to Learn First (And What to Ignore) kdnuggets.com
Like Comment
To view or add a comment, sign in
Adebayo Rhema Omoyeni
3w
Report this post
Pandas is an open-source Python library used for data manipulation and analysis. It provides high-performance data structures and tools for working with structured (tabular) data, making it a cornerstone for data science and machine learning workflows. While NumPy arrays are powerhouse tools for numerical computation, they struggle with a core reality of data: real-world data is messy. It has missing values, mixed types (strings next to floats!), and requires complex joins or grouping. Enter **pandas** and the **DataFrame**. 🐼 Why pandas is the "Gold Standard" for Flat Files: 1. Heterogeneous Data: Unlike matrices, DataFrames handle different data types across columns simultaneously. 2. R-Style Power in Python: As Wes McKinney intended, pandas allows you to stay in the Python ecosystem for your entire workflow from munging to modeling without switching to domain-specific languages like R. 3. Wrangling at Scale: It’s "missing-value friendly." Whether you’re dealing with weird comments in a CSV or `NaN` values, pandas handles them gracefully during the import process. # The 3-Line Power Move: Importing a flat file is as simple as: ```python import pandas as pd # Load the data data = pd.read_csv('your_file.csv') # See the first 5 rows instantly print(data.head()) ``` The Big Takeaway: As Hadley Wickham famously noted: "A matrix has rows and columns. A data frame has observations and variables." In the world of Data Science, we aren't just looking at numbers; we’re looking at **observations**. Using `pd.read_csv()` isn't just a shortcut it’s best practice for building a robust, reproducible data pipeline. #DataEngineering #Python #Pandas #DataAnalysis #MachineLearning
2 Comments
Like Comment
To view or add a comment, sign in
Leonard K.
1mo
Report this post
NumPy in 2026: Why It Still Sits at the Core of Modern Data Science In a world increasingly dominated by high-level machine learning frameworks and automated pipelines, it’s easy to overlook the foundational tools that make it all possible. One of those tools; quiet, efficient, and incredibly powerful is NumPy. After years of building data products, training models, and optimizing pipelines, I can say this confidently: if you truly understand NumPy, you unlock a different level of control, performance, and clarity in your work. What NumPy Really Is (Beyond the Basics) Most people learn NumPy as “that Python library for arrays.” That’s technically correct, but incomplete. NumPy is a high-performance numerical computing engine. At its core is the ndarray, a contiguous block of memory that allows vectorized operations to run at near C-level speed. This is what separates NumPy from plain Python lists. Why does this matter? Because performance is not just about speed—it’s about scalability and feasibility. Operations that would take minutes in pure Python can execute in milliseconds with NumPy. Vectorization: The Skill That Separates Juniors from Seniors Early in my career, l use to write loops like this: result = [] for i in rage( len(a) ) result.append( a[i] + b[i] ) Now l write things in a more simple manner: result = a + b This is vectorization. Under the hood, NumPy pushes computation down to optimized C routines. The result is: • Cleaner code • Faster execution • Better use of CPU cache and memory If there’s one concept to master in NumPy, it’s this. Broadcasting: Elegant Solutions to Complex Problems Broadcasting is one of NumPy’s most powerful and misunderstood features. It allows operations between arrays of different shapes without explicit reshaping. Example: a = np.array([[1, 2, 3], [4, 5, 6]]) b = np.array([10, 20, 30]) result = a + b Instead of throwing an error, NumPy “broadcasts” b across each row. For real-world applications, this means: • Efficient feature scaling • Batch transformations • Cleaner mathematical expressions Memory Efficiency and Why It Matters In production environments, memory becomes a constraint long before compute does. NumPy gives you control over: • Data types (float32 vs float64) • Memory layout • Views vs copies Example: a = np.arange(10) b = a[2:5] # This is a view, not a copy Understanding this distinction can prevent subtle bugs and reduce memory overhead significantly, especially when working with large datasets. NumPy in the Modern Stack Even if you primarily use higher-level tools, NumPy is still underneath: • Pandas uses NumPy arrays internally • Scikit-learn relies heavily on NumPy operations • TensorFlow and PyTorch tensors are conceptually similar When performance issues arise, the bottleneck often traces back to how efficiently NumPy is being used.
Like Comment
To view or add a comment, sign in
Riya Khandelwal
1mo Edited
Report this post
Python has quietly become the backbone of the modern data ecosystem. Whether you work in Data Engineering, Analytics, or Machine Learning, there are a few libraries that almost every data professional ends up using sooner or later. I recently put together a quick cheat sheet of 10 Python libraries that are extremely useful in the data domain. ↳ NumPy The foundation for numerical computing in Python. Many other libraries are built on top of it. ↳ Pandas One of the most widely used libraries for data manipulation and analysis using DataFrames. ↳ Matplotlib A core library for creating visualizations such as line charts, bar charts, and scatter plots. ↳ Seaborn Built on top of Matplotlib, it makes statistical data visualization much easier and cleaner. ↳ PySpark Essential for working with large-scale distributed data processing using Apache Spark. ↳ Scikit-learn A powerful machine learning library for tasks like classification, regression, clustering, and model evaluation. ↳ Dask Helps scale Python workloads by enabling parallel computing for large datasets. ↳ Polars A high-performance DataFrame library designed for speed and efficiency. ↳ Airflow Widely used for orchestrating and scheduling data pipelines. ↳ Requests A simple yet powerful library to interact with APIs and fetch data from external services. The interesting part is that most real-world data workflows use a combination of these libraries rather than relying on just one. For example: APIs with Requests → Data processing with Pandas or PySpark → Pipeline orchestration with Airflow → Visualization with Matplotlib or Seaborn. If you're building a career in the data domain, getting comfortable with these tools can make your day-to-day work much smoother. 📌𝗙𝗼𝗿 𝗠𝗲𝗻𝘁𝗼𝗿𝘀𝗵𝗶𝗽/ 𝟭:𝟭 𝗖𝗮𝗹𝗹 𝗯𝗼𝗼𝗸 𝗵𝗲𝗿𝗲 -- https://lnkd.in/gjHqeHMq 📌 𝐋𝐨𝐨𝐤𝐢𝐧𝐠 𝐟𝐨𝐫 𝐑𝐞𝐬𝐮𝐦𝐞 𝐡𝐚𝐯𝐢𝐧𝐠 𝟗𝟎+ 𝐀𝐓𝐒 𝐬𝐜𝐨𝐫𝐞? 𝗗𝗼𝘄𝗻𝗹𝗼𝗮𝗱 𝗥𝗲𝗰𝗿𝘂𝗶𝘁𝗲𝗿-𝗔𝗽𝗽𝗿𝗼𝘃𝗲𝗱 𝗥𝗲𝘀𝘂𝗺𝗲 𝗧𝗲𝗺𝗽𝗹𝗮𝘁𝗲 -https://lnkd.in/gepAc5C6 📌 𝗟𝗼𝗼𝗸𝗶𝗻𝗴 𝘁𝗼 𝗯𝘂𝗶𝗹𝗱 𝘆𝗼𝘂𝗿 𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗖𝗮𝗿𝗲𝗲𝗿? 𝗜 𝗮𝗺 𝗵𝗼𝘀𝘁𝗶𝗻𝗴 𝗮 𝗳𝗼𝗿 𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗖𝗼𝗵𝗼𝗿𝘁 , 𝗘𝗻𝗿𝗼𝗹𝗹 𝗵𝗲𝗿𝗲- https://lnkd.in/gmY58PSH #Python #DataEngineering #DataScience #Analytics #BigData
4 Comments
Like Comment
To view or add a comment, sign in
Sher Hassan
1mo
Report this post
SQL and SQLite with Python Data is useless if you can't store it properly. This week, I learned SQL and SQLite with Python, and it changed how I think about handling data in real-world applications. Before this, I was mostly working with data in memory. Now, I can store, manage, and retrieve data efficiently — just like real Data Science and production systems. Here’s what I explored: • Creating databases using SQLite • Storing structured data using SQL tables • Writing queries to retrieve specific insights • Updating and deleting records efficiently • Connecting Python with SQLite for automation • Managing datasets in a scalable and organized way What I found most interesting is how Python + SQL creates a powerful combination: Python → Data processing & analysis SQL → Data storage & retrieval Together, they form the backbone of many Data Science and AI systems. To reinforce my learning, I created my own structured notes and I’m sharing them as a PDF in this post. Hopefully, it helps others who are building their Data Science foundation. Step by step, building towards Data Science & AI #DataScience #SQL #SQLite #Python #Database #AI #MachineLearning #LearningInPublic #TechJourney
Like Comment
To view or add a comment, sign in
Apeksha Gourshete
1mo
Report this post
📘 Python for PySpark Series – Day 8 ⚡ List Comprehension (Efficient Data Transformation) ✨ What is List Comprehension? List comprehension is a compact way to create and transform lists in a single line of code. Instead of writing multiple lines using loops, we can write clean and readable code. ➡️ It is widely used in data transformation and preprocessing. ⚙️ Why Do We Need List Comprehension? In data processing: ❓ What if we want to transform a list quickly? ➡️ Using loops can be longer and less readable. ✔ List comprehension makes code short and elegant ✔ Improves readability ✔ Faster execution compared to loops 🔹 Basic Syntax [expression for item in iterable] Example: numbers = [1, 2, 3, 4] result = [x * 2 for x in numbers] ➡️ Output: [2, 4, 6, 8] 🔹 With Condition We can also add conditions. numbers = [10, 25, 30, 45] result = [x for x in numbers if x > 20] ➡️ Output: [25, 30, 45] 🔗 Why List Comprehension Matters in Data Engineering In real-world data processing, we often: ✔ Transform data ✔ Filter records ✔ Clean datasets ➡️ List comprehension helps to perform these tasks quickly and efficiently. 🏫 Real-Life Analogy (Fast Processing Machine ⚙️) Imagine a machine that: 📥 Takes raw items ⚙️ Processes them instantly 📤 Gives output in one step ➡️ List comprehension works like a fast processing machine. 🧠 Interview Key Points ✔ List comprehension creates lists in a single line ✔ Improves code readability and performance ✔ Can include conditions (if) ✔ Commonly used for data transformation ✔ Faster than traditional loops in many cases 🧠 Key Takeaway List comprehension is a powerful and efficient way to transform and filter data, making it highly useful in data engineering and PySpark workflows. 🔖 Hashtags #python #pyspark #dataengineering #bigdata #listcomprehension #pythonbasics #learningjourney #coding
Like Comment
To view or add a comment, sign in

554 followers

View Profile Follow

NumPy Basics: Essential Library for Data Science and Machine Learning

More Relevant Posts

Explore content categories