Python Learning Journey: Tuples, Dictionaries, Sets & More

1mo

🚀 𝐏𝐲𝐭𝐡𝐨𝐧 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 𝐉𝐨𝐮𝐫𝐧𝐞𝐲 Today’s focus was understanding Python concepts with real-world application 👇 🔹 Tuple (Immutable) Cannot be modified after creation Faster than lists Methods: count(), index() Supports slicing, concatenation +, repetition * Tuple packing → a = 10,20,30 💼 Use: Fixed data (invoices, records) 🔹 Dictionary (Key-Value Mapping) Access via keys (not index) Add/Update: dict[key] = value Methods: get(), keys(), values(), items(), update() Removal: pop(), popitem(), del, clear() 💼 Use: Supplier / business data 🔹 Set (Unique & Unordered) No duplicates, no indexing Key ops: ✔ Union | → combine ✔ Intersection & → common ✔ Difference - → pending ✔ Symmetric ^ → mismatch 💼 Use: Remove duplicate customers, data cleaning 🔹 Data Structures List [] (mutable) | Tuple () (immutable) Set {} (unique) | Dict {k:v} (mapping) 🔹 Static vs Dynamic Coding Static → fixed values Dynamic → user input 🔹 Input & Type Casting input() → string Convert: int(), float() eval() executes input (syntax matters ⚠️) String is iterable → list(input()) 🔹 print() & Output print("Hello", name) (best practice) Concatenation needs same type ❌ string + int → error ✔ Fix: str() or format() 🔹 String Formatting & Errors {} auto | {0} manual (don’t mix ❌) Errors learned: TypeError, IndexError, ValueError 🔹 Other Concepts Multiple operations → tuple (A-B, A*B, A/B) len(), type() del → delete variable \n → new line | r"" → raw string 💼 Business Insight: Set → remove duplicates Dict → manage structured data Tuple → store fixed data 👉 Right data structure = better performance & decisions Python is not just coding — it’s about solving real business problems logically. #Python #DataAnalytics #BusinessAnalytics #LearningJourney

To view or add a comment, sign in

More Relevant Posts

Shafiq Ahmed
3w
Report this post
I Built a Custom Auto-EDA Engine 🚀 Copy/Paste the text below: Most Data Scientists spend 60% of their time just doing basic EDA. I got tired of writing the same df.describe(), sns.heatmap(), and plt.show() lines for every single project. It felt like manual labor, not data science. So, I decided to automate it. 🛠️ I built a Smart Auto-EDA Profiler using Python, Pandas, and Plotly. Instead of spending an hour building charts, I now run one function and get a professional, interactive HTML report in seconds. What makes this "Smart"? Beyond just plotting data, I programmed it to "think" like an analyst: ✅ Automatic Alerts: It flags constant columns, high cardinality, and missing values instantly. ✅ Interactive Visuals: Powered by Plotly, so I can zoom into outliers and hover for exact values. ✅ Statistical Intelligence: It calculates correlations and distribution skewness on the fly. ✅ Portable Reports: Everything is bundled into a single HTML file—perfect for sharing with stakeholders who don't have Python installed. The goal wasn't just to save time; it was to ensure I never miss a data quality issue ever again. The Tech Stack: 🐍 Python | 🐼 Pandas | 📊 Plotly | 📝 Jinja2 Automation is the bridge between a "good" analyst and a "great" one. Why do the same task twice when you can build a tool to do it forever? Check out the screenshots below to see the report in action! 👇 #DataScience #Python #Automation #DataAnalytics #Efficiency #Pandas #Programming #MachineLearning "I will be sharing the full report tomorrow. Stay tuned for a detailed breakdown
Like Comment
To view or add a comment, sign in
May Zahedi
3w
Report this post
🚀 Python can remove hours of repetitive Excel work , here’s a great example: I recently came across this article on KDnuggets, which breaks down practical Python scripts for automating Excel tasks: 👉 “5 Useful Python Scripts to Automate Boring Excel Tasks” https://lnkd.in/gEMrBZ2u 🔗 GitHub repo: useful-python-excel-scripts https://lnkd.in/gbS9NAcX What I like about it is that it focuses on real, everyday Excel problems analysts deal with. 💡 Here’s what each script helps you automate: 📁 1. Merge multiple Excel/CSV files Instead of manually copying and pasting data from different files, this script automatically reads all files in a folder and combines them into one dataset , ideal for monthly reporting or consolidating exports. 🧹 2. Clean messy data Handles common issues like extra spaces, inconsistent formatting, missing values, and standardises column structures. This is often one of the most time-consuming parts of Excel work. 🔍 3. Detect duplicates Finds duplicate or near-duplicate rows in datasets, helping improve data quality , especially useful for customer lists or transactional data. ✂️ 4. Split large datasets Splits one large Excel file into multiple smaller files based on rules (e.g. region, category, or date). Very useful when distributing reports to different stakeholders. 📊 5. Automate basic reporting outputs Generates structured summaries (pivot-style outputs) and simple charts, reducing repetitive monthly reporting work. 💭 My takeaway: These aren’t complex machine learning solutions — they’re simple but powerful automation tools that remove repetitive Excel effort. For analysts, that means: ✔️ Less manual work ✔️ More consistency ✔️ More time for insights, not preparation 💬 Curious : which of these tasks do you spend the most time on? #Python #Excel #Automation #DataAnalytics #PowerBI #Productivity #Finance #BI
Like Comment
To view or add a comment, sign in
Haroon Saleemi
1mo
Report this post
🚀 𝐅𝐫𝐨𝐦 𝐌𝐮𝐥𝐭𝐢𝐩𝐥𝐞 𝐙𝐈𝐏 𝐅𝐢𝐥𝐞𝐬 𝐭𝐨 𝐎𝐧𝐞 𝐂𝐥𝐞𝐚𝐧 𝐄𝐱𝐜𝐞𝐥 𝐒𝐡𝐞𝐞𝐭 — 𝐏𝐨𝐰𝐞𝐫𝐞𝐝 𝐛𝐲 𝐏𝐲𝐭𝐡𝐨𝐧 Handling bulk data can quickly become messy—especially when it’s spread across multiple ZIP files containing CSVs. Recently, I worked on a solution to automate this entire process using Python, transforming scattered data into a single, structured Excel file. Here’s how the workflow looks: 🔹 𝐒𝐭𝐞𝐩 𝟏: 𝐄𝐱𝐭𝐫𝐚𝐜𝐭 𝐙𝐈𝐏 𝐅𝐢𝐥𝐞𝐬 Each ZIP file is programmatically opened and its contents extracted. This removes the need for manual unzipping and ensures consistency across files. 🔹 𝐒𝐭𝐞𝐩 𝟐: 𝐑𝐞𝐚𝐝 𝐂𝐒𝐕 𝐃𝐚𝐭𝐚 Inside each ZIP, CSV files are loaded using Python libraries like pandas. This allows fast and efficient handling of thousands of rows. 🔹 𝐒𝐭𝐞𝐩 𝟑: 𝐂𝐨𝐧𝐯𝐞𝐫𝐭 𝐭𝐨 𝐄𝐱𝐜𝐞𝐥 𝐅𝐨𝐫𝐦𝐚𝐭 Each CSV is converted into an Excel sheet, preserving structure while making it easier to analyze and share. 🔹 𝐒𝐭𝐞𝐩 𝟒: 𝐌𝐞𝐫𝐠𝐞 𝐢𝐧𝐭𝐨 𝐎𝐧𝐞 𝐅𝐢𝐥𝐞 All individual Excel sheets are combined into a single master workbook—either as separate sheets or a unified dataset. 🔹 𝐒𝐭𝐞𝐩 𝟓: 𝐀𝐮𝐭𝐨𝐦𝐚𝐭𝐢𝐨𝐧 & 𝐒𝐩𝐞𝐞𝐝 With loops and optimized functions, the entire pipeline runs automatically—even for large datasets (tens of thousands of rows per file). 💡 𝐊𝐞𝐲 𝐁𝐞𝐧𝐞𝐟𝐢𝐭𝐬: Saves hours of manual work Eliminates human error Scales easily for large datasets Produces clean, analysis-ready Excel files 📊 𝐈𝐧 𝐦𝐲 𝐫𝐞𝐜𝐞𝐧𝐭 𝐫𝐮𝐧, 𝐞𝐚𝐜𝐡 𝐟𝐢𝐥𝐞 𝐩𝐫𝐨𝐜𝐞𝐬𝐬𝐞𝐝 ~𝟐𝟖,𝟎𝟎𝟎+ 𝐫𝐨𝐰𝐬, 𝐚𝐧𝐝 𝐦𝐮𝐥𝐭𝐢𝐩𝐥𝐞 𝐙𝐈𝐏𝐬 𝐰𝐞𝐫𝐞 𝐬𝐞𝐚𝐦𝐥𝐞𝐬𝐬𝐥𝐲 𝐦𝐞𝐫𝐠𝐞𝐝 𝐢𝐧𝐭𝐨 𝐨𝐧𝐞 𝐜𝐨𝐧𝐬𝐨𝐥𝐢𝐝𝐚𝐭𝐞𝐝 𝐄𝐱𝐜𝐞𝐥 𝐨𝐮𝐭𝐩𝐮𝐭. This kind of automation is a game-changer for data analysts, eCommerce managers, and anyone dealing with bulk exports. 🔧 𝐓𝐞𝐜𝐡 𝐒𝐭𝐚𝐜𝐤 𝐔𝐬𝐞𝐝: 𝐏𝐲𝐭𝐡𝐨𝐧 𝐏𝐚𝐧𝐝𝐚𝐬 𝐙𝐢𝐩𝐟𝐢𝐥𝐞 𝐎𝐩𝐞𝐧𝐏𝐲𝐗𝐋 / 𝐗𝐥𝐬𝐱𝐖𝐫𝐢𝐭𝐞𝐫 If you're working with repetitive data tasks, automation like this isn't just helpful—it’s essential. 💬 𝐂𝐮𝐫𝐢𝐨𝐮𝐬 𝐡𝐨𝐰 𝐭𝐡𝐢𝐬 𝐜𝐚𝐧 𝐛𝐞 𝐚𝐝𝐚𝐩𝐭𝐞𝐝 𝐭𝐨 𝐲𝐨𝐮𝐫 𝐰𝐨𝐫𝐤𝐟𝐥𝐨𝐰? 𝐋𝐞𝐭’𝐬 𝐝𝐢𝐬𝐜𝐮𝐬𝐬! #Python #DataAutomation #Excel #DataProcessing #Automation #Pandas #TechWorkflow
1 Comment
Like Comment
To view or add a comment, sign in
Varsha T
1mo
Report this post
𝗣𝘆𝘁𝗵𝗼𝗻 𝗙𝘂𝗻𝗱𝗮𝗺𝗲𝗻𝘁𝗮𝗹𝘀 𝗳𝗼𝗿 𝗗𝗮𝘁𝗮 𝗔𝗻𝗮𝗹𝘆𝘀𝗶𝘀 – 𝗤𝘂𝗶𝗰𝗸 𝗥𝗲𝗰𝗮𝗽 🔎 Strong data analysis starts with clear programming fundamentals. 🔎 Python provides a flexible and efficient foundation for handling data, automating tasks, and building analytical workflows. 🔎 This recap focuses on three core concepts: variables, loops, and functions. 💡𝗩𝗮𝗿𝗶𝗮𝗯𝗹𝗲𝘀 – 𝗦𝘁𝗼𝗿𝗶𝗻𝗴 𝗮𝗻𝗱 𝗠𝗮𝗻𝗮𝗴𝗶𝗻𝗴 𝗗𝗮𝘁𝗮 Variables are used to store different types of data such as numbers, text, and lists. They act as containers that make data manipulation easier. 💡𝗘𝘅𝗮𝗺𝗽𝗹𝗲 name = "Varsha" sales = 5000 growth_rate = 0.12 💡𝗟𝗼𝗼𝗽𝘀 – 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗻𝗴 𝗥𝗲𝗽𝗲𝘁𝗶𝘁𝗶𝘃𝗲 𝗧𝗮𝘀𝗸𝘀 Loops help in iterating through data structures and performing repeated operations efficiently. 💡𝗙𝗼𝗿 𝗟𝗼𝗼𝗽 𝗘𝘅𝗮𝗺𝗽𝗹𝗲 sales_list = [1000, 2000, 3000] for sale in sales_list: print(sale) While Loop Example count = 1 while count <= 3: print(count) count += 1 💡𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝘀 – 𝗪𝗿𝗶𝘁𝗶𝗻𝗴 𝗥𝗲𝘂𝘀𝗮𝗯𝗹𝗲 𝗟𝗼𝗴𝗶𝗰 Functions allow analysts to organize code into reusable blocks, improving readability and efficiency. 💡𝗘𝘅𝗮𝗺𝗽𝗹𝗲 def calculate_total(a, b): return a + b result = calculate_total(1000, 2000) print(result) 💡𝗪𝗵𝘆 𝗧𝗵𝗲𝘀𝗲 𝗖𝗼𝗻𝗰𝗲𝗽𝘁𝘀 𝗠𝗮𝘁𝘁𝗲𝗿 𝗶𝗻 𝗗𝗮𝘁𝗮 𝗔𝗻𝗮𝗹𝘆𝘀𝗶𝘀 • Helps in handling and transforming data efficiently • Enables automation of repetitive analytical tasks • Improves code readability and structure • Builds a strong foundation for advanced analytics 📢𝗞𝗲𝘆 𝗜𝗻𝘀𝗶𝗴𝗵𝘁 Mastering Python fundamentals enables analysts to write clean, efficient code and build scalable solutions for real-world data problems. #Python #DataAnalytics #DataAnalysis #Programming #LearnPython #AspiringDataAnalyst
Like Comment
To view or add a comment, sign in
Danial raza
4w
Report this post
🚀 Automating Data Workflows with Python & Pandas I’ve been diving deeper into Python for data analysis, and I just built a script that automates a common (and often tedious) task: cleaning CSV data and converting it into multiple formats for different stakeholders. 🛠️ The Problem: CSV files often come with "messy" formatting—like stray spaces after commas—that can break standard data pipelines. Plus, different teams need the same data in different formats (Web devs want JSON, Managers want Excel, and Data Engineers want CSV). 💡 The Solution: Using pandas and os, I created a script that: Cleans on the fly: Used skipinitialspace=True to automatically trim whitespace issues that usually cause KeyErrors. Performs Vectorized Math: Calculated total sales across the entire dataset in a single line of code. Automates File Management: Dynamically creates output directories and exports the results into JSON, Excel, and CSV simultaneously. 📦 Key Tools Used: Pandas: For high-performance data manipulation. OS Module: For robust file path handling. Openpyxl: To bridge the gap between Python and Excel. It’s a simple script, but it’s a foundational step toward building more complex, automated data pipelines! Check out the logic below: 👇 Python import pandas as pd import os # Read & Clean: skipinitialspace=True is a lifesaver for messy CSVs! df = pd.read_csv('data/sales.csv', skipinitialspace=True) # Transform: Vectorized calculation for 'total' df['total'] = df['quantity'] * df['price'] # Automate: Exporting to 3 different formats at once os.makedirs('output', exist_ok=True) df.to_json('output/sales_data.json', orient='records', indent=2) df.to_excel('output/sales_data.xlsx', index=False) df.to_csv('output/sales_with_totals.csv', index=False) #Python #DataAnalysis #Pandas #Automation #CodingJourney #DataScience
Like Comment
To view or add a comment, sign in
Kirti Singh
3w
Report this post
🚀 Day 342 of solving 365 medium questions on LeetCode! 🔥 Today’s challenge: “3653. XOR After Range Multiplication Queries I” ✅ Problem: You are given an integer array nums and a list of queries. Each query provides a starting index l, an ending index r, a step size k, and a multiplier v. For each query, you must multiply the elements in the range from l to r by v (modulo 10^9 + 7), stepping by k each time. Return the final bitwise XOR of all elements in the array after all queries are processed. ✅ Approach (Array Simulation) Since this is the first version of the problem ("Queries I"), the constraints allow for a direct simulation approach! Apply Queries: I iterate through each query, unpacking the variables l, r, k, and v. I use a nested loop with Python's built-in range(l, r + 1, k) to perfectly handle the specific step logic required. Modulo Math: For each target index i in that hopped sequence, I multiply the current value nums[i] by v and immediately apply the modulo self.MOD (which is 10^9 + 7) to prevent massive integer overflows during subsequent queries. The XOR Sum: Once all queries are completely processed and the array is finalized, I initialize a res = 0 variable. A final, simple pass through the nums array applies the bitwise XOR operator (^=) to accumulate and return the final answer. ✅ Key Insight Python's range function with a step argument makes array-hopping logic beautifully concise. Instead of writing a messy while loop to manually track and increment the index by k, a single for loop naturally handles the boundaries and the exact hops in one clean, highly readable line! ✅ Complexity Time: O(Q \times \frac{N}{K} + N) — Where Q is the number of queries, N is the length of the array, and K is the step size. In the worst-case scenario, we iterate over segmented portions of the array for each query, followed by one final O(N) pass to compute the XOR sum. Space: O(1) — We modify the given nums array strictly in-place and only use a single integer variable (res) for the final calculation, requiring zero extra auxiliary data structures. 🔍 Python solution attached! 🔥 Flexing my coding skills until recruiters notice! #LeetCode365 #Simulation #BitManipulation #Arrays #Python #ProblemSolving #DSA #Coding #SoftwareEngineering
Like Comment
To view or add a comment, sign in
SANKET KAMBLI
1mo Edited
Report this post
✅ Built a Python data cleaning script that handles common data issues in one run. Every time I started working on a new dataset, I was doing the same things over and over again, fixing column names, removing duplicates, dealing with nulls, converting dates, cleaning symbols like '$' or '%'. So I built a script that handles all of it in one run. • Here's what it does: → Cleans column headers - lowercase, underscores, no extra spaces → Detects and removes duplicate columns and rows → Strips '$' and '%' symbols from numeric columns automatically → Converts date columns from strings to datetime → Handles null values based on data type → Winsorizes outliers using the IQR method → Standardizes inconsistent text values • I tested it on a marketing dataset (~2,000 rows), and here’s what it did: → Headers cleaned — Campaign_ID, Clicks → campaign_id, clicks → 1 duplicate column removed → $ symbols stripped from the spend column → start_date and end_date converted to datetime → 19 duplicate rows dropped → 300 null values handled - channel (100), conversions (200) → 195 outliers winsorized - clicks (11), spend (74), conversions (110) • What I learned while building this: → Breaking each cleaning step into its own function made it much easier to build, test, and debug → Logging every step is the only way to actually know what your script did to your data → Making it work on any dataset - not just one specific file - was the hardest part ⚠️ Note: Works best with structured CSV files. Edge cases and limitations are mentioned on GitHub. Full script + documentation: 🔗https://lnkd.in/dUAE7JH5 #Python #DataAnalytics #DataAnalyst #Pandas #DataCleaning

1 Comment
Like Comment
To view or add a comment, sign in
ActuaryWhoCodes

186 followers
1mo
Report this post
🧠 I Replaced a 2-Hour Weekly Excel Task with a 6-Line Python Script Every Monday, I used to: 👉 open multiple Excel files 👉 copy-paste data into a master sheet 👉 clean and reformat columns 👉 generate a summary report It took about 2 hours. Every. Single. Week. Here’s the script that now does it in 8 seconds: 📃 📃 📃 📃 📃 📃 📃 📃 import pandas as pd, glob files = glob.glob('data/*.xlsx') df = pd.concat([pd.read_excel(f) for f in files], ignore_index=True) df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_') summary = df.groupby('product')'claim_amount'].agg(['sum','mean','count']) summary.to_excel('weekly_summary.xlsx', index=True) 📃 📃 📃 📃 📃 📃 📃 📃 Just 6 lines. What changed wasn’t just speed: ✅ no manual errors ✅ consistent formatting ✅ repeatable process ✅ more time for actual analysis The tools already exist. The barrier is usually just not knowing where to start. 👉 What’s one repetitive task in your workflow you wish you could automate? #ActuaryWhoCodes #PythonForActuaries #Automation #ActuarialScience #Productivity #DataAnalytics
Like Comment
To view or add a comment, sign in
Adarsh Choudhary
1mo Edited
Report this post
“Data cleaning is where real data science begins.” // Today I spent time working on a real-world CSV dataset using Pandas in Python—and it turned out to be a great reminder that data rarely comes in a “ready-to-use” format. At first glance, everything looked fine after loading it with read_csv(). But as I started exploring the dataset more deeply using functions like info(), describe(), and isnull().sum(), a different story emerged: • Missing values across multiple columns • Inconsistent data formats • Some columns that added little to no analytical value • A few unexpected duplicates Instead of rushing into model building, I focused on understanding and preparing the data: • Dropped irrelevant columns using drop() • Handled missing values (both removal and basic imputation) • Checked for duplicate records and removed them • Standardized column formats where needed • Took time to actually understand what each feature represents One key realization from this exercise: Good models don’t come from complex algorithms alone—they come from clean, meaningful, and well-prepared data. It’s easy to get excited about machine learning models, but the real impact lies in the quality of the data you feed them. --Data cleaning may not be the most glamorous part of the workflow, but it’s definitely one of the most critical. //Grateful for the guidance and support from teacher Mohit Payasi sir throughout this learning process—having the right direction makes a huge difference when building strong fundamentals.🙏🏻🌟 --Strong foundations today lead to better, more reliable models tomorrow./ ''Would love to learn from others—what are your must-do steps when working with messy, real-world datasets?'' #DataScience #Python #Pandas #DataCleaning #MachineLearning #DataAnalytics #LearningJourney #Programming
Like Comment
To view or add a comment, sign in
Karnulu Suresh
2w
Report this post
Headline: Stop wasting 4 hours on EDA. Do it in 4 lines of code. ⏳ Exploratory Data Analysis (EDA) is the most critical step in any data project, but let’s be honest—writing the same df.describe(), plt.scatter(), and sns.heatmap() code over and over is a soul-crushing time sink. In the industry, we use AutoEDA libraries to get 80% of the insights with 2% of the effort. 🚀 Here are my top 3 picks for your toolkit: 1️⃣ ydata-profiling (formerly Pandas Profiling): The "Gold Standard." It generates a massive, interactive HTML report covering correlations, missing values, and detailed stats for every column. 2️⃣ Sweetviz: The "Comparison King." Perfect for spotting Data Drift. If you need to see exactly how your Train set differs from your Test set, this is the tool. 3️⃣ AutoViz: The "Speed Demon." It automatically identifies the most important features and selects the best charts (Scatter, Box, Violin) for you. It’s incredibly fast, even on larger datasets. The Reality Check: ⚠️ Are these used for real-time streaming data? Usually, no. They are "batch" tools meant for the initial discovery phase or sanity-checking a new data dump. For live monitoring, you're better off with Grafana or Great Expectations. But for your next CSV or SQL export? Don't start from scratch. Automate the boring stuff so you can focus on the actual strategy. Which one is your go-to? Or are you still team Matplotlib/Seaborn for everything? 👇 #DataScience #Python #MachineLearning #Analytics #Efficiency #CodingTips
Like Comment
To view or add a comment, sign in

301 followers

43 Posts

View Profile Connect

Python Learning Journey: Tuples, Dictionaries, Sets & More

More Relevant Posts

Explore content categories