🚀 Day 7 of #100DaysOfDataEngineering Topic: Python Advance - NumPy Basics Tags: #Python #NumPy #DataEngineering #DataScience Today marks the start of our journey into Python for numerical computing. Meet NumPy (Numerical Python), the core library that powers data transformations, mathematical operations, and many popular tools like Pandas and Scikit-learn. 💡 What is NumPy? NumPy provides multi-dimensional arrays (ndarray) and efficient functions to work with them. It is built for speed, allowing you to process large datasets much faster than standard Python lists. Install NumPy with: pip install numpy Import the library: import numpy as np 🧱 Creating Arrays You can create NumPy arrays directly from Python lists or nested lists: import numpy as np # 1D array arr1 = np.array([10, 20, 30]) # 2D array arr2 = np.array([[5, 10, 15], [20, 25, 30]]) print(arr1.shape) # (3,) print(arr2.shape) # (2, 3) print(arr1.dtype) # int64 ⚙️ Common Attributes AttributeDescriptionndarray.shapeDimensions of the array (rows, columns)ndarray.dtypeType of data stored (int, float, etc.)ndarray.ndimNumber of dimensionsndarray.reshape()Change array shape without changing data 📊 Built-in Methods NumPy includes several helpful functions for creating arrays quickly: # Evenly spaced values (like range) np.arange(0, 10, 2) # [0 2 4 6 8] # Arrays of zeros and ones np.zeros((2, 3)) # 2x3 array of zeros np.ones((3, 2)) # 3x2 array of ones # Equally spaced numbers np.linspace(0, 10, 5) # [0. 2.5 5. 7.5 10.] 🎲 Generating Random Numbers NumPy makes it simple to generate test data for experiments and simulations: # Random values (uniform distribution) np.random.rand(3, 4) # Random values (normal distribution) np.random.randn(2, 3) # Random integers between 4 and 40 np.random.randint(4, 40, 10) # 4x4 matrix of random integers up to 50 np.random.randint(50, size=(4,4)) ✅ Key Takeaway NumPy is all about speed and simplicity. It lets you handle large datasets and perform calculations efficiently. These array operations form the foundation of every scalable data pipeline.
Lakshmi Prasad B.’s Post
More Relevant Posts
-
🧮 Dissecting NumPy: Working With Intrinsic NumPy Objects For Array Creation 💪 It feels really exciting getting into the core of NumPy and seeing it unlocking its true strength infront of me! 🤔 Why NumPy Arrays Are Better Than Python Lists? - Fast Computation Of Large Datasets - Dont Require Loops - Easy Arithmetical Operations - Consume Less Memory ⚙️Today i digged a bit deeper into NumPy Array Creation With Intrinsic Objects like: - np.ones/np.zeros: gives arrays of 1s and 0s - np.arange(): gives a sequence of array unlike python range() that gives integers - np.linspace(): gives equally linear numbers in array between a start and stop value - np.reshape(): it can simply reshape a given array without changing its data, means generates a new array with a different number of rows and columns (as specified) But, listen up! ‼️One critical thing about NumPy Arrays is their Axis0(rows) and Axis1(coulums). ‼️It means we can perform some of the arithmetical ops on row elements and colum elements using their axis 💭 Its been a productive week by far getting into the world of NumPy and unlocking a new skill on the way of becoming a data scientist! 🫡 Until we meet again, my fellow coders! ------------------------- ☺️ Here are Python (Beginner to Intermediate) GitHub Repos for you: 📁Python Variables: https://lnkd.in/e9rjz-_D 📁Python Operators: https://lnkd.in/e6hzgHSn 📁Python Conditionals: https://lnkd.in/egQNGZBF 📁Python Loops: https://lnkd.in/eezUg_-y 📁Python Functions: https://lnkd.in/eKdU6nex 📁Python Lists & Tuples: https://lnkd.in/eZ8KiQNs 📁Python Dictionaries & Sets: https://lnkd.in/eDmgj7pc 📁Python OOP: https://lnkd.in/eJFupCiK 📁Python DSAs: https://lnkd.in/ebR3rjkt ------------------------- 🤓 NumPy (Beginner To Intermediate): 🧮Arrays: https://lnkd.in/ebghYRYE ------------------------- ⚡ Follow my learning journey: 📎 GitHub: https://lnkd.in/ehu8wX85 🔗 GitLab: https://lnkd.in/eiiQP2gw 💬 Feedback: I’d love your thoughts and tips! 🤝 Collab: If you’re also exploring Python, DM me! Let’s grow together! -------------------------- 📞Book A Call With Me: https://lnkd.in/e23BtnR9 -------------------------- #pythonnumpy #NumPy #pythonlibraries #pythonfordatascience #datascience #machinelearning #artificialintelligence
To view or add a comment, sign in
-
As data volumes grow, efficiency in data processing becomes critical. Our latest blog explores whether Polars could be the next big step beyond Pandas for Python developers and data scientists. Read the full insight here: https://lnkd.in/g38X4uS9 #DataScience #Python #PolarsvsPandas #polarspythonlibrary #pandasinpython #DataMites
To view or add a comment, sign in
-
Feel like you're writing complex loops for simple data cleaning tasks in Python? Ibrahim Salami's debut TDS article shares 7 lesser-known NumPy functions that can help you write cleaner, more efficient code for data analysis.
To view or add a comment, sign in
-
🚀 Day 6 of my 30 Days Python Challenge: Mastering Python starts with mastering lists! Here’s your ready reckoner with real code + outputs and use cases—just for you! A Python list is an ordered, dynamic, and mutable collection that can hold any type of data—like numbers, strings, or even other lists. 💡 Why use lists? Flexible for all data types Store, access, edit, or remove elements easily Used everywhere—from automation to data science! 🎯 Let’s see real list magic in Python: # 1️⃣ Create & check type my_list = [1, 2, 3, 4, 5] print(type(my_list)) # Output: <class 'list'> ###################################### # 2️⃣ List with multiple data types mixed = [10, "Krishna", 2.5, True] print(mixed) # Output: [10, 'Krishna', 2.5, True] ###################################### # 3️⃣ Access by index fruits = ['Apple', 'Banana', 'Mango'] print(fruits[0]) # Output: Apple print(fruits[2]) # Output: Mango ###################################### # 4️⃣ Slicing numbers = [1, 2, 3, 4, 5] print(numbers[1:4]) # Output: [2, 3, 4] ###################################### # 5️⃣ Change value numbers = [10, 20, 30, 40] numbers[2] = 99 print(numbers) # Output: [10, 20, 99, 40] ###################################### # 6️⃣ Add elements (append, insert, extend) nums = [1, 2, 3] nums.append(4) print(nums) # Output: [1, 2, 3, 4] nums.insert(1, 99) print(nums) # Output: [1, 99, 2, 3, 4] nums.extend([7, 8]) print(nums) # Output: [1, 99, 2, 3, 4, 7, 8] ###################################### # 7️⃣ Remove element (pop) colors = ['red', 'green', 'blue'] removed = colors.pop() print(removed) # Output: blue print(colors) # Output: ['red', 'green'] ###################################### # 8️⃣ Nested lists matrix = [ [1, 2, 3], [4, 5, 6] ] print(matrix[1][2]) # Output: 6 ###################################### # 9️⃣ List functions (len, max, min) data = [7, 12, 4, 9, 21] print(len(data)) # Output: 5 print(max(data)) # Output: 21 print(min(data)) # Output: 4 ###################################### # 🔟 Methods (reverse, copy, count) nums = [5, 2, 5, 7] nums.reverse() print(nums) # Output: [7, 5, 2, 5] copy_nums = nums.copy() print(copy_nums) # Output: [7, 5, 2, 5] print(nums.count(5)) # Output: 2 ###################################### 🔥 Real-world uses: Data science: holding survey results, experiment values Web: lists of products, users ML/AI: feature sets, predictions Automation: batch rename, organize files and folders ❓ Your turn: What’s the most creative way you used a Python list? Share your favorite tip, question, or challenge below 👇 Want more Python details? Comment "YES" & save this post! #Python #DevOps #LearnPython #CodingTips #Automation #DataScience #AIBasics #Programming #LinkedInLearning Follow for more actionable DevOps and Python tips with real-world examples!
To view or add a comment, sign in
-
-
Writing a for-loop in Python to process a list of data? You might be adding hours to your script's runtime without even knowing it. I see this all the time: analysts use loops for data transformations that could be done in a fraction of the time. The bottleneck isn't your computer's speed—it's how you're talking to it. The secret to faster data processing in Python is vectorization. Instead of processing each element one-by-one in a loop, vectorized operations apply a function to an entire dataset simultaneously, leveraging optimized, pre-compiled C code under the hood. Let's take a common task: calculating the square of every number in a list. The Slow Way (Loop): python import pandas as pd data = pd.Series(range(1, 1000001)) squared_list = [] for num in data: squared_list.append(num ** 2) The Fast Way (Vectorized): python import pandas as pd data = pd.Series(range(1, 1000001)) squared_list = data ** 2 The vectorized approach isn't just cleaner—it's dramatically faster. For a million rows, the loop might take ~150ms, while the vectorized operation can finish in ~2ms. That's a 98.7% reduction in processing time! This principle applies across pandas and NumPy: Use df['column'].str.upper() instead of looping with .upper() Use df['column'].apply(function) instead of a for-loop (.apply is optimized) Use NumPy's universal functions (np.log, np.sqrt) on arrays Adopting a vectorized mindset is a game-changer for efficiency. Have you ever refactored a slow loop into a vectorized operation? What was the performance boost like? Share your story below! #Python #DataAnalysis #Pandas #CodingTips #DataScience
To view or add a comment, sign in
-
-
I Just Made My Python Data Processing Pipeline 20x Faster - Here's How After months of working with large-scale time-series data processing, I finally took the time to optimize our analysis pipeline. The results? Absolutely game-changing. The Problem Our data processing workflow was handling datasets with hundreds of thousands of data points per analysis run. Processing time was consistently taking several minutes. The Solution: Vectorization Most Python developers know about NumPy, but few leverage it to its full potential. I systematically replaced list comprehensions and loops with vectorized operations and the performance gains were staggering. Real Results from Production Code Core Operations Performance: Primary Classification Logic: 7.5x faster (89.6ms → 11.9ms) Lookup Operations: 27.4x faster (62.7ms → 2.29ms) State Transition Analysis: 27.0x faster (26.9ms → 1.0ms) Aggregation Functions: 16.0x faster (31.9ms → 2.0ms) Overall Impact: Average speedup: ~16-27x across the entire pipeline Before (Slow): # List comprehension - processing one element at a time result = [process_element(x) for x in large_dataset['values']] After (Fast): # Vectorized operation - processing entire array at once result = vectorized_process(large_dataset['values']) The difference? NumPy operates on entire arrays using optimized C code, while list comprehensions iterate in pure Python, one element at a time. Key Lessons Learned Vectorization > Everything: For numerical operations, vectorized NumPy arrays are 10-50x faster than Python loops Profile First, Optimize Second: Don't guess where the bottlenecks are - measure them with proper profiling tools Technical Implementation Core Technique: Replaced iterative Python operations with NumPy's vectorized alternatives: np.select() for conditional logic np.where() for branching operations np.vectorize() for applying functions PERFORMANCE BENCHMARK RESULTS Test dataset: 100,000 data points (Operation names have been kept confidential.) Operation A: 7.5x faster Operation B: 27.4x faster Operation C: 27.0x faster Operation D: 16.0x faster Average improvement: ~20x faster #Python #PerformanceOptimization #DataEngineering #SoftwareEngineering #DataScience #NumPy #CodeOptimization #TechLeadership
To view or add a comment, sign in
-
Looking to manage large datasets more efficiently in Python? Check out these 7 overlooked tricks using Pandas library: chunked loading, downcasting data types, categorical data conversion, Parquet file saving, GroupBy aggregation, query
To view or add a comment, sign in
-
🧱 QUICK TIP #2 – “Numbered Shelf: How the ARRAY Keeps Everything in Place (Python)” 1️⃣ Structure Name Array (indexed collection) 2️⃣ Goal Store many elements side by side and access each one by index with constant-time speed. 3️⃣ Everyday Analogy Think of a numbered shelf (0, 1, 2, 3…). You know exactly where each item is and grab it by position — no searching required. 4️⃣ Common Use Cases Math & stats operations Signals, images, numeric series Data buffers & batch processing Compact memory layouts 5️⃣ Technical Advantage Random access O(1) by index: read/write is blazing fast when you know the position. 6️⃣ Technical Drawback Often fixed-size in low-level arrays; inserting in the middle is costly (shifts elements). 7️⃣ Python Example (lists used like arrays) # Arrays in Python – quick demo ⚡ # Author: Izairton Oliveira de Vasconcelos import time import numpy as np print("=== ARRAYS IN PYTHON ===") # Basic list demo prices = [9.90, 12.50, 7.80, 15.00] prices[2] = 8.10 prices.insert(1, 10.00) print("Prices:", prices) # NumPy fast math a = np.array([1, 2, 3, 4]) print("a * 10 ->", a * 10) print("mean ->", a.mean(), "std ->", a.std()) # Middle insertion cost arr = list(range(100000)) t0 = time.perf_counter() arr.insert(len(arr)//2, -1) print(f"Insertion in middle took {(time.perf_counter()-t0)*1000:.3f} ms") print(prices, first_item) 8️⃣ Efficient Numeric Example (NumPy) import numpy as np a = np.array([1, 2, 3, 4]) b = a * 10 # vectorized op mean_val = a.mean() # fast stats print(b, mean_val) # [10 20 30 40], 2.5 9️⃣ When to Use When you need fast index-based access and batch numeric operations with contiguous memory. 🔟 ✨ The Aha Moment (Resumo do Pulo do Gato) “An array is your program’s numbered shelf: every item has a fixed address, so you grab exactly what you need instantly.”
To view or add a comment, sign in
-
-
Efficiently handle large datasets in Python with Pandas! 🐼 Learn 7 tricks to optimize memory usage and processing speed for tabular, text, or time-series data. #Python #DataScience #Programming #Efficiency 🚀
To view or add a comment, sign in
-
🚀 Day 35 of #100DaysOfPython – Working with Date, Time & Calendar 🕒📅 Today’s topic helps us retrieve and manipulate the current date, time, and calendar using Python’s built-in modules — time and calendar. 🕐 1️⃣ Retrieving the Current Time Python provides the time() and localtime() functions to get system time. import time lt = time.localtime(time.time()) print(lt) 🧩 Output example: time.struct_time(tm_year=2022, tm_mon=4, tm_mday=14, tm_hour=10, tm_min=30, ...) 🔹 Attributes of time.struct_time: tm_year → Current year tm_mon → Current month tm_mday → Day of month tm_hour → Hour tm_min → Minute tm_sec → Second tm_wday → Weekday tm_yday → Day of year tm_isdst → Daylight saving flag 🕓 2️⃣ Formatted Time with asctime() The asctime() method returns the current time as a formatted string. import time lt = time.asctime(time.localtime(time.time())) print(lt) ✅ Example Output: Thu Apr 14 10:33:59 2022 🧮 3️⃣ Converting String to Time – strptime() Used to parse strings into time structures. import time tr = time.strptime("26 jun 14", "%d %b %y") print(tr) 📖 Example Output: time.struct_time(tm_year=2014, tm_mon=6, tm_mday=26, ...) 🧭 4️⃣ Formatting Time – strftime() Converts time into a specific string format. import time t = (2014, 6, 26, 17, 3, 38, 1, 48, -1) t = time.mktime(t) print(time.strftime("%d %m %y %H:%M:%S", time.gmtime(t))) ✅ Output: 26 06 14 11:33:38 📆 5️⃣ Python Calendar Module The calendar module allows working with dates, months, and years. import calendar print(calendar.prcal(2023)) 🧠 Common Calendar Functions: Method Description prcal(year) Prints full calendar for a year firstweekday() Returns first weekday (default Monday = 0) isleap(year) Checks if year is leap monthcalendar(year, month) Returns matrix of weeks in month leapdays(y1, y2) Counts leap years between y1 and y2 prmonth(year, month) Prints specific month 🗓️ Example: import calendar print(calendar.isleap(2020)) # True print(calendar.monthcalendar(2022, 6)) calendar.prmonth(2022, 5) ✨ In Short: time → retrieves and formats time strptime / strftime → convert between strings & time calendar → helps print and analyze calendars 💬 Which one do you use most often — datetime, time, or calendar? #Python #100DaysOfCode #LearningPython #PythonProgramming #DateTime #Calendar
To view or add a comment, sign in
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development