Name: Mastering Python Strings for Data Analytics | Ishant Bhardwaj posted on the topic | LinkedIn
Uploaded: 2026-04-26T11:30:02.676Z
Duration: 2 min 20 s
Channel: Ishant Bhardwaj

Ishant Bhardwaj

🚀 Strings & String Methods in Python #Day31 If variables are containers, strings are how Python stores and handles text data. Names, emails, passwords, customer data, file paths, web scraping, data cleaning — strings are everywhere. 🔹 What is a String? A string is a sequence of characters enclosed in quotes. name = "Harry" city = 'Delhi' Both single and double quotes work the same. Strings can contain: ✅ Letters ✅ Numbers (as text) ✅ Symbols ✅ Spaces "Python" "12345" "Hello @2026" 🔹 Multiline Strings Use triple quotes for text spanning multiple lines: message = """This is a multi line string""" Useful for documentation, SQL queries, or long messages. 🔹 String Indexing Each character has a position (index). text = "Python" P y t h o n 0 1 2 3 4 5 print(text[0]) # P print(text[3]) # h ⚡ Indexing starts from 0. Python also supports negative indexing: text[-1] # n text[-2] # o Very useful when working from the end of a string. ✂️ String Slicing Slicing extracts a portion of a string. text[0:3] # Pyt text[2:] # thon text[:4] # Pyth Negative slicing: text[-3:] # hon Powerful and widely used in data manipulation. 🔹 len() Function Find the length of a string: len("Python") Output: 6 Even spaces are counted. 🛠 Common String Methods 1. lower() and upper() "PYTHON".lower() "python".upper() Useful for standardizing text. 2. strip() Removes extra spaces: " hello ".strip() Great for cleaning raw data. 3. replace() "Hello World".replace("World","Python") Output: Hello Python 4. split() Turns a string into a list: "apple,banana,orange".split(",") Used heavily in data parsing. 5. join() Opposite of split: ",".join(["apple","banana","orange"]) 6. find() Find position of text: "Hello World".find("World") Returns index or -1 if not found. 7. startswith() and endswith() email.endswith(".com") email.startswith("test") Very useful in validation. 🔍 Checking String Content isalpha() isdigit() isalnum() Examples: "Python".isalpha() "123".isdigit() "Python123".isalnum() Useful for validation logic. 🔄 Strings Are Immutable Important concept: text="Python" text[0]="J" ❌ Error Strings cannot be modified directly. Any change creates a new string. 💡 Why Strings Matter in Data Analytics Strings are everywhere in analytics: 📌 Cleaning messy datasets 📌 Working with CSV files 📌 Parsing emails & text 📌 Filtering data 📌 Web scraping 📌 Text analysis Mastering strings makes data cleaning much easier. Python strings may look simple, but they’re one of the most powerful tools in programming. #Python #PythonProgramming #DataAnalytics #PowerBI #Excel #MicrosoftPowerBI #MicrosoftExcel #DataAnalysis #DataAnalysts #CodeWithHarry #DataVisualization #DataCollection #DataCleaning

To view or add a comment, sign in

More Relevant Posts

Baragath Nasrin
3d
Report this post
Day 4 of My Data Analyst Journey – Data Cleaning in Python Today, I practiced data cleaning techniques using Python, focusing on handling real-world messy text data. Problem Statement: I had a dataset of customer feedback containing: • Extra spaces • Mixed casing (UPPER/lower) • Punctuation (., !, ?) Objective: Clean and standardize the feedback text for better analysis. What I implemented: Removed punctuation using .replace() Converted text to lowercase Removed leading & trailing spaces using .strip() Handled lists inside a dictionary Python Code: import string feedback_data = { 'S_No': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'Name': ['Ravi', 'Meera', 'Sam', 'Anu', 'Raj', 'Divya', 'Arjun', 'Kiran', 'Leela', 'Nisha'], 'Feedback': [ ' Very GOOD Service!!!', 'poor support, not happy ', 'GREAT experience! will come again.', 'okay okay...', ' not BAD', 'Excellent care, excellent staff!', 'good food and good ambience!', 'Poor response and poor handling of issue', 'Satisfied. But could be better.', 'Good support... quick service.' ], 'Rating': [5, 2, 5, 3, 2, 5, 4, 1, 3, 4] } punctuation = ".,!?" cleaned_feedbackdata = {} for key, value in feedback_data.items(): if isinstance(value, list): new_list = [] for item in value: if isinstance(item, str): item = item.strip().lower() for p in punctuation: item = item.replace(p, "") new_list.append(item) cleaned_feedbackdata[key] = new_list else: cleaned_feedbackdata[key] = value print(cleaned_feedbackdata) Outcome: Cleaned and structured feedback data ready for analysis like sentiment detection, keyword extraction, and insights generation. Key Learning: Data cleaning is one of the most important steps in data analysis—clean data = better insights! #Python #DataCleaning #DataAnalytics #LearningJourney #BeginnerToPro #CodingPractice #100DaysOfCode
Like Comment
To view or add a comment, sign in
Sai Pavan Kumar Velpur
3w
Report this post
Lists are used everywhere: apps, APIs, databases, analytics. Day 3 — Lists (Python Arrays for Real Data Handling) 1. Concept (Real-World Understanding) A list is a collection of multiple values stored in a single variable. Think of it like a container that holds multiple items in order. Python fruits = ["apple", "banana", "mango"] Key Properties: Ordered → items keep their position Mutable → you can modify them Allows duplicates Can store different data types Python data = ["Rahul", 25, True, 99.5] Real-Life Analogy A list is like a shopping cart : You can add items Remove items Check items Update items 2. Coding Examples (Real World) Example 1: Accessing Elements Python fruits = ["apple", "banana", "mango"] print(fruits[0]) # apple print(fruits[-1]) # mango Example 2: Modifying List Python fruits = ["apple", "banana", "mango"] fruits[1] = "orange" print(fruits) Output: ['apple', 'orange', 'mango'] Example 3: Adding Items Python cart = ["laptop", "mouse"] cart.append("keyboard") print(cart) Example 4: Removing Items Python cart = ["laptop", "mouse", "keyboard"] cart.remove("mouse") print(cart) Example 5: Looping Through List (VERY IMPORTANT) Python items = ["pen", "book", "bag"] for item in items: print(item) This is used in almost every real project 3. Important List Operations 1. Length Python numbers = [10, 20, 30] print(len(numbers)) # 3 2. Check Item Exists Python fruits = ["apple", "banana"] print("apple" in fruits) # True 3. Extend List Python a = [1, 2] b = [3, 4] a.extend(b) print(a) Output: [1, 2, 3, 4] 4. Pop (Remove by Index) Python nums = [10, 20, 30] nums.pop(1) print(nums) 4. Practice Problems Problem 1 Create a list of 5 numbers and: Print first element Print last element Problem 2 Given: Python numbers = [10, 20, 30, 40] Add: 50 at the end 5 at the beginning Problem 3 Remove duplicate values: Python [1, 2, 2, 3, 4, 4] Problem 4 Loop through a list and print only even numbers 5. Mini Challenge (Real World) Build a Shopping Cart System Python cart = [] Operations: Add items Remove item Show cart Example Output: Your cart contains: ['laptop', 'mouse'] Bonus Challenge Calculate total price: Python prices = [100, 200, 300] Output: Total = 600 6. Common Beginner Mistakes Confusing append vs extend Python a = [1, 2] a.append([3, 4]) Output: for n in nums: nums.remove(n) # Wrong Leads to unexpected behavior 7. Takeaway From This Concept Lists store multiple values in one place Lists are mutable (can change) You can: Add (append) Remove (remove, pop) Loop (for) Lists are used in: APIs Databases User inputs Data processing #Day3 #Python
Like Comment
To view or add a comment, sign in
Nischal Karki
2w
Report this post
Day 2 of Learning Python – And I Just Built My First Real Data Audit System 📊🐍 Today I didn’t just “learn Python”… I used it to analyze structured company-style audit data and built a Mistake Scoring System that automatically evaluates performance. And honestly, It felt like stepping into real business intelligence work. 💡 What I built today: Using Pandas, I processed an audit dataset and generated insights like: 📌 Total deals per responsible person 📌 Pipeline distribution per team member 📌 Mistake scoring based on missing actions (follow-ups, updates, documents) 📌 Final performance summary ranking everyone by errors ⚙️ The idea behind the system: Instead of manually checking performance, I created a logic-based scoring system where: Missing documents = +1 error No follow-up = +1 error No comment update = +1 error Unresolved status = +3 heavy penalty This turns raw data into actionable performance insights. 💻 Code I used: import pandas as pd file_path = r " Instered your excel data file here" Note: The r before the file path means it is a raw string, which helps Python correctly read the path without treating backslashes as escape characters. Also, make sure your Excel file is saved in the same folder where your Python script is located, or ensure the correct full file path is provided. df = pd.read_excel(file_path) # CLEAN DATA df.columns = df.columns.str.strip() df = df.fillna("No") # MISTAKE SCORE SYSTEM df["Mistake Score"] = 0 df.loc[df["Document/RF Request"] == "No", "Mistake Score"] += 1 df.loc[df["Comment Updates"] == "No", "Mistake Score"] += 1 df.loc[df["Follow up"] == "No", "Mistake Score"] += 1 df.loc[df["Status"].str.lower() == "unresolved", "Mistake Score"] += 3 # ANALYSIS print(df["Responsible"].value_counts()) print(df.groupby(["Responsible", "Pipeline"]).size()) mistakes = df.groupby("Responsible")["Mistake Score"].sum().sort_values(ascending=False) print(mistakes) summary = df.groupby("Responsible").agg( Total_Deals=("Responsible", "count"), Total_Mistakes=("Mistake Score", "sum") ) print(summary.sort_values("Total_Mistakes", ascending=False)) 🚀 Key takeaway: Even simple Python + Excel data can be transformed into a decision-making system that highlights performance gaps instantly. Day 2 of learning — and I’m already seeing how powerful data can be in real business environments. Can’t wait to build dashboards and automate even more next 🔥 #Python #DataAnalysis #Pandas #LearningInPublic #DataScience #Automation #BusinessIntelligence #CareerGrowth
Like Comment
To view or add a comment, sign in
Manish Mohapatra
1w
Report this post
📊 Detecting & Treating Outliers in Python - The Data Points That Can Mislead You You’ve cleaned missing values. Your dataset looks fine. But there’s one more hidden problem most beginners miss: And that is outliers And sometimes, just one outlier can completely distort your analysis. 🔹 Why Do Outliers Matter? Because they can quietly break your results: ❌ Skew averages ❌ Mislead insights ❌ Affect visualizations ❌ Reduce model accuracy 👉 One extreme value = one wrong conclusion What is an Outlier? An outlier is a data point that is significantly different from the rest of the data. It can be extremely high. Or extremely low. Either way — it does not represent the typical pattern. Examples from real data: An employee with a salary of ₹500 in a company where average salary is ₹60,000 A customer who ordered 9,000 units when everyone else ordered between 5 and 50 An age value of 150 in a health dataset These are not just unusual — they are dangerous to your analysis if left untreated. Step 1 — Detect Outliers Visually Always start by looking at the data. import seaborn as sns # Box plot to spot outliers visually sns.boxplot(x=df['salary']) A box plot immediately shows you which values fall far outside the normal range. Any dot beyond the whiskers — that is your outlier. Step 2 — Detect Outliers Using IQR Method The IQR (Interquartile Range) method is the most reliable way to detect outliers mathematically. Q1 = df['salary'].quantile(0.25) Q3 = df['salary'].quantile(0.75) IQR = Q3 - Q1 lower = Q1 - 1.5 * IQR upper = Q3 + 1.5 * IQR # Find outliers outliers = df[(df['salary'] < lower) | (df['salary'] > upper)] print(outliers) Anything below the lower limit or above the upper limit is flagged as an outlier. Step 3 — Treat the Outliers Now you have three choices depending on your situation. Remove them — when the outlier is clearly an error. df = df[(df['salary'] >= lower) & (df['salary'] <= upper)] Cap them — replace extreme values with the boundary limit. df['salary'] = df['salary'].clip(lower=lower, upper=upper) Replace with median — when you want to keep the row but fix the value. median = df['salary'].median() df['salary'] = df['salary'].apply( lambda x: median if x < lower or x > upper else x ) How to Decide Which Method to Use Situation Best Approach Value is a data entry error Remove it Value is extreme but possible Cap it You cannot afford to lose rows Replace with median Here is the truth no one tells beginners. Outliers are not always mistakes. Sometimes they are the most interesting part of your data — the customer who spends the most, the employee who performs the best, the product that sells far beyond expectations. Your job is not to blindly remove them. Your job is to understand them first — then decide. That is what separates a careful analyst from a careless one. 💡 #DataAnalytics #Python #DataCleaning #Outliers #DataAnalyst #LearningData
Like Comment
To view or add a comment, sign in
Obiageli Innocent
3w
Report this post
Day 12/30 - Nested Data Structures in Python Today everything clicked. Lists, dicts, tuples. They don't live separately. Real data nests them together. What is Nesting? Nesting means placing one data structure inside another. A list can contain dictionaries. A dictionary can contain lists. A dictionary can even contain other dictionaries. This is how Python represents complex, real-world data - the same structure used in JSON APIs, databases, and config files. Four Common Nesting Patterns List inside Dict -> a dictionary key holds a list as its value e.g. a student's list of scores Dict inside List -> a list contains multiple dictionaries e.g. a list of student records Dict inside Dict -> a key holds another dictionary e.g. a user with a nested address object List inside List -> a list contains other lists e.g. rows and columns in a grid or table How to Access Nested Data You access nested data by chaining brackets one for each level you go deeper: data["student"]["scores"][0] -->open dict , go to scores key, grab index 0 Rule: count the levels of nesting, then use that many brackets to reach the value. Looping Through Nested Structures When your data is a list of dictionaries, use a for loop to go through each dictionary, then use bracket notation to pull out values. This is the most common real-world pattern- reading records from an API or database. Code Example 1: List Inside a Dict python student = { "name" : "Obiageli", "scores": [88, 92, 75, 95], "passed": True } print(student["scores"]) = [88, 92, 75, 95] print(student["scores"][0]) = 88 print(student["scores"][-1]) = 95 Key Learnings ☑ Nesting = placing one data structure inside another ☑ Access nested data by chaining brackets , one bracket per level ☑ A list of dictionaries is the most common pattern, it's how API and database data looks ☑ Use a for loop to go through a list of dicts and pull values from each record ☑ Nested structures are the foundation of JSON -master this and real-world data won't feel foreign My Takeaway Nested data structures are where all the previous days connect. Lists, tuples, sets, dictionaries - they don't live in isolation. Real data combines all of them. Today I started seeing data the way Python sees it. #30DaysOfPython #Python #LearnToCode #CodingJourney #WomenInTech
Like Comment
To view or add a comment, sign in
Sai Kiran
2w Edited
Report this post
My Python script ran for 3 hours. Then crashed. No error message. Nothing. I had no idea what went wrong. I had no idea which step failed. I had no idea how to fix it. That was me — 2 years into my data engineering journey. Here's what I wish someone told me earlier 👇 ───────────────── When you write a Python ETL script ──── 3 things will go wrong: ────────────── 1) The API or database will disconnect randomly 2) One step will be extremely slow — but you won't know which one 3) When it crashes — you'll have zero information about why These are not beginner problems. These happen to every data engineer. Every single day. ─────────────The fix? Python Decorators. ──────────────────── Think of a decorator like a wrapper you put around your function. The function does its job — but the wrapper adds extra superpowers. Like gift wrapping. The gift inside doesn't change. But now it's protected, labelled, and trackable. There are 3 decorators every data engineer should know: → @retry — if something fails, try again automatically (3 times, 5 second gap) → @timer — tells you exactly how long each step took to run → @log_execution — writes a diary of every step: started, completed, or failed Before decorators, my pipeline was a black box. After decorators — I know exactly what ran, how long it took, and where it broke. ─────────── Real example from my work: ──────────────────── I was loading data from an API into Azure Data Lake every night. Some nights the API would timeout at 2 AM. The whole pipeline would crash. Data missing. Reports wrong. After adding @retry: → API times out → waits 5 seconds → tries again → succeeds → Nobody wakes up. Nobody sends angry Slack messages. That one change saved hours of manual re-runs every week. ──────────────────── You don't need to write decorators from scratch. Python has a library called 'tenacity' — one line install. pip install tenacity That's it. Import it. Use @retry. Done. I'm still learning Python deeply myself. But this was the moment I stopped writing fragile scripts and started writing pipelines that could survive the real world. Are you using any error handling in your Python pipelines? Drop your approach in the comments — I'd love to learn from you too 👇 #Python #DataEngineering #ETL #DataEngineer #PythonProgramming #DataPipeline #Azure #Snowflake #TechTips #OpenToWork #DataCommunity #100DaysOfPython #HiringDataEngineers
1 Comment
Like Comment
To view or add a comment, sign in
J V Pavan
4d
Report this post
PySpark code is a classic implementation of a Reliable Streaming Pipeline ⚙️ Phase 1: The Continuous Engine This part of the code tells Databricks to keep the "engine" running 24/7. Python (spark.readStream.table("source_append_table") .filter("(status IS NULL) AND (record_type = 'file_type')") .writeStream .foreachBatch(load_all_and_route_errors) # Calls the logic below .option("checkpointLocation", "/mnt/delta/checkpoints/dual_target_load") .trigger(processingTime='10 seconds') # ✅ Makes it run continually .start() ) 🛠️ Phase 2: The Validation & Routing Function This is the internal logic (load_all_and_route_errors) that runs every time new data is detected. 1. Persisting Data (The Memory Guard) 💾 Python microBatchDf.persist() Icon: 🧠 Action: Saves the incoming data in RAM. Why: Since we are writing to two tables (Main and Error), we don't want Spark to do the work twice. Caching it here makes the job twice as fast. 2. The Validation Engine (The Inspector) ⚖️ Python errors = F.array_remove(F.array( F.when(F.col("order_id").isNull(), "Missing order_id"), F.when(F.col("price") < 0, "Negative price") ), None) Icon: 🔍 Action: Captures WHY it failed. It creates a list of errors for every row. Note: Unlike simple filters, this ensures you have an audit trail of reasons for every bad record. 3. Flagging the Data 🏷️ Python validated_df = (microBatchDf .withColumn("validation_status", F.when(F.size(errors) > 0, "Invalid").otherwise("Valid")) ) Icon: 🚩 Action: Tags every single row as either Valid or Invalid based on the results of the Validation Engine. 🍴 Phase 3: The Fork in the Road (Dual Write) Path A: The Clean Production Table ✅ Python only_valid_records = validated_df.filter("validation_status = 'Valid'") (only_valid_records.write .format("delta") .mode("append") .saveAsTable("main_target_table")) Icon: 🏦 Strategy: Only rows with zero errors move forward. This keeps your business dashboards clean and trustworthy. Path B: The Quarantine/Error Table 🚨 Python invalid_records = validated_df.filter("validation_status = 'Invalid'") if not invalid_records.isEmpty(): (invalid_records.write .format("delta") .mode("append") .saveAsTable("error_records_table")) Icon: 🚧 Strategy: Redirects bad data to a separate log. Because we captured the reasons, engineers can immediately see that "Row X failed because of a negative price." 🧹 Phase 4: Final Cleanup Python microBatchDf.unpersist() Icon: 🧼 Action: Clears the memory block. Why: In a continually running job, if you forget this, your cluster memory will fill up over time and eventually crash (OOM error). 💡 Summary of "Continuous" Best Practices Use Job Clusters: In Databricks, run this as a "Continuous Job" type so Databricks automatically restarts it if the cloud provider has a hiccup. final tip: since you are now running this continually, ensure your cluster is sized correctly for a 24/7 workload!
Like Comment
To view or add a comment, sign in
Vivek Bhave
2w Edited
Report this post
𝗖𝗮𝗻 𝗦𝗤𝗟 𝗱𝗼 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗮𝗻𝗮𝗹𝘆𝘀𝗶𝘀? We usually do feature analysis in Python, but what if we cannot load millions of rows in Python? Can we do that with SQL? To figure this out, I took the problem of customer churn and tried to understand why customers are leaving and what we can do about it. For this, I tried to understand the behavior of churned customers across the different groups of each feature. For example, does a high number of support calls lead to churning? To study customer behavior, I calculated the churn rates across the groups of each feature using AVG() in SQL. I used churn rate because it allows comparison irrespective of group size. For calculating the churn rate for numerical features like payment delay, I first divided this feature into groups using GROUP BY in SQL. I did this by identifying the sudden difference in churn rates between two values. Consequently, I identified the thresholds of behavioral change and labeled the groups using a CASE conditional statement. For categorical features, it can be easily calculated. To decide which feature is important, I used this criteria: 1. The churn rate difference must be significant for at least one group compared to others. This suggests that after this threshold is the breaking point of customer behavior. 2. The pattern should be stable, to avoid random noise. 3. Group sizes should be comparable. Example: Issue Level (Support Calls) +------------------+------------------+ | Issue Level | Churn Rate | +------------------+------------------+ | Low | 0.10 | | Medium | 0.25 | | High | 0.80 | +-------------------+-----------------+ Churn rate stays stable across low and medium but increases sharply at high issue level. Customers waited patiently until the support calls were in the medium issue level. Once the threshold is crossed, 80% of the customers leave. That means one should respond to support calls before reaching the high issue level; otherwise, the customer will leave. In customer churn, the features are: Age, Gender, Tenure, Usage Frequency, Support Calls, Payment Delay, Subscription Type, Contract Length, Total Spend, Last Interaction, and Churn. For more detailed analysis, check out github repo (Notebooks/SQL_Analysis folder): https://lnkd.in/gUx9vgyE #SQL #FeatureAnalysis #CustomerChurn #DataAnalytics #DataScience #SQLAnalytics #ChurnAnalysis #DataEngineering #BehavioralAnalysis #AnalyticsEngineering #BigData #DataCommunity
Like Comment
To view or add a comment, sign in
Sanju B.Tech, MBA.
2w
Report this post
How can video data be transformed into structured data suitable for analysis? Transforming video into structured data for analysis with Snowflake #Python. There are several approaches depending on what you want to extract: 1️⃣. 🇲🇪🇹🇦🇩🇦🇹🇦 🇪🇽🇹🇷🇦🇨🇹🇮🇴🇳 Duration, resolution, FPS, codec, file size Libraries: ffmpeg-python, moviepy, opencv-python 2️⃣. 🇫🇷🇦🇲🇪 🇪🇽🇹🇷🇦🇨🇹🇮🇴🇳 (🇮🇲🇦🇬🇪 🇩🇦🇹🇦) Extract frames as images at intervals Convert to pixel arrays (NumPy) for analysis Libraries: OpenCV (cv2), ffmpeg python import cv2 cap = cv2.VideoCapture('video.mp4') while cap.isOpened(): ret, frame = cap.read() # frame is a NumPy array # Process frame... 3️⃣. 🇴🇧🇯🇪🇨🇹/🇸🇨🇪🇳🇪 🇩🇪🇹🇪🇨🇹🇮🇴🇳 Detect and count objects per frame (people, vehicles, products) Libraries: YOLO, TensorFlow, PyTorch, AWS Rekognition, Google Vision API 4️⃣. 🇦🇺🇩🇮🇴/🇸🇵🇪🇪🇨🇭 🇹🇴 🇹🇪🇽🇹 Extract audio track → transcribe to text → analyze Libraries: whisper (OpenAI), speech_recognition, Google Speech-to-Text 5️⃣. 🇴🇵🇹🇮🇨🇦🇱 🇨🇭🇦🇷🇦🇨🇹🇪🇷 🇷🇪🇨🇴🇬🇳🇮🇹🇮🇴🇳 (🇴🇨🇷) Extract on-screen text (dashboards, slides, signage) Libraries: pytesseract, EasyOCR, PaddleOCR 6️⃣. 🇲🇴🇹🇮🇴🇳/🇦🇨🇹🇮🇻🇮🇹🇾 🇦🇳🇦🇱🇾🇸🇮🇸 Optical flow, motion heatmaps, activity recognition Libraries: OpenCV, MediaPipe, MMAction2 7️⃣. 🇫🇦🇨🇮🇦🇱/🇪🇲🇴🇹🇮🇴🇳 🇦🇳🇦🇱🇾🇸🇮🇸 Detect faces, recognize emotions, track gaze Libraries: DeepFace, dlib, MediaPipe 8️⃣. 🇸🇹🇷🇺🇨🇹🇺🇷🇪🇩 🇩🇦🇹🇦 🇴🇺🇹🇵🇺🇹 All the above techniques produce structured data (CSV, JSON, tables) that can be loaded into Snowflake for analysis: ---------------------------------------------------------------------------------------- Frame/Timestamp | Objects Detected | Text Found | Speech Transcript | Emotion 00:01:05 | 3 people, 1 car | "EXIT" | "Turn left here" | Happy ---------------------------------------------------------------------------------------- In Snowflake Context You can combine this with Snowflake by: Pre-processing video externally (Python) → extract structured data Load extracted data into Snowflake tables. Use Cortex AI functions like AI_CLASSIFY, AI_EXTRACT, AI_SUMMARIZE on the extracted text/transcript data. Use AI_PARSE_DOCUMENT if you convert frames to images/PDFs for document-style extraction. The key insight: video itself isn't directly queryable — you must first transform it into structured/semi-structured data (text, numbers, labels) using the techniques above, then analyze that data. #DataEngineer #ETL #DataAnalysis
Like Comment
To view or add a comment, sign in
anuj chhetri
3w
Report this post
My Data Science Journey — Python Tuple, Set, Dictionary & the Collections Library Today’s focus was on Python’s core data structures — Tuples, Sets, and Dictionaries — along with the powerful collections module that enhances their functionality for real-world use cases. 𝐖𝐡𝐚𝐭 𝐈 𝐋𝐞𝐚𝐫𝐧𝐞𝐝: Tuple – Ordered, immutable, allows duplicates – Single element tuples require a trailing comma → ("cat",) – Supports packing and unpacking → x, y = 10, 30 – Cannot be modified after creation (TypeError by design) – Faster than lists in certain operations – Used in scenarios like geographic coordinates and fixed records – Can be used as dictionary keys (unlike lists) Set – Unordered, mutable, stores unique elements only – No indexing or slicing support – Empty set must be created using set() ({} creates a dict) – .remove() raises KeyError if element not found – .discard() removes safely without error – Supports operations like union, intersection, difference, symmetric_difference – Methods like issubset(), issuperset(), isdisjoint() help in set comparisons – frozenset provides an immutable version of a set – Offers O(1) average time complexity for membership checks Dictionary – Key-value pair structure, ordered, mutable, and keys must be unique – Built on hash tables for fast lookups – user["key"] → raises KeyError if missing – user.get("key", default) → safe access with fallback – Methods: keys(), values(), items() for iteration – pop(), popitem(), update(), clear(), del for modifications – Widely used in real-world data like APIs and JSON responses – Common pattern: list of dictionaries for structured datasets Collections Library – namedtuple → tuple with named fields for better readability – deque → efficient queue with O(1) operations on both ends – ChainMap → combines multiple dictionaries without merging copies – OrderedDict → maintains order with additional utilities like move_to_end() – UserDict, UserList, UserString → useful for customizing built-in behaviors with validation and extensions Performance Insight – List → O(n) – Tuple → O(n) – Set → O(1) (average lookup) – Dictionary → O(1) (average lookup) 𝐊𝐞𝐲 𝐈𝐧𝐬𝐢𝐠𝐡𝐭: Understanding when to use each data structure — and how collections enhances them — is crucial for writing efficient, scalable, and clean Python code. Read the full breakdown with examples on Medium 👇 https://lnkd.in/gvv5ZBDM #DataScienceJourney #Python #Tuple #Set #Dictionary #Collections #Programming #DataStructures

Python — Tuple, Set, Dictionary & the collections Library: A Complete Guide medium.com
Like Comment
To view or add a comment, sign in

2,204 followers

39 Posts

View Profile Follow

More Relevant Posts

Explore content categories