🧠 Group Anagrams: The "Fingerprint" Strategy In this problem, I moved beyond the standard sorting approach (O(n .m log m)) to a more efficient Frequency Array strategy (O(n . m)). Memory Management: I learned how Python handles memory during loops. By declaring count = [0] * 26 inside the outer loop, I’m giving each word a fresh "sheet of paper" to record its letter frequency. Once that word is processed and "locked" as a tuple (to serve as a dictionary key), Python’s Garbage Collector steps in to clean up the old list. The Data Science Connection: This frequency array isn't just a coding trick; it's the foundation of One-Hot Encoding and Bag of Words in Data Science. It’s how we turn raw text into numerical vectors that AI models can actually understand. 🔍 Longest Common Prefix: The Power of Vertical Scanning Instead of checking one word at a time, I focused on Vertical Scanning—checking the first letter of every word, then the second, and so on. Complexity: Achieved O(S) time complexity. By using the shortest word as my base, I ensured zero wasted cycles and no IndexError traps. Pythonic Elegance: I explored the zip(*strs) strategy. It’s amazing how Python can "unpack" a list and group characters by their index in a single line. The Sorting Shortcut: A clever logic leap—if you sort the list, you only need to compare the first and last strings. If they share a prefix, everything in the middle must share it too. The takeaway? Code isn't just about getting the right answer; it's about knowing how your data sits in RAM and how to make every operation count. Onto the next one! 🐍💻 #DataScience #Python #SoftwareEngineering #Neetcode#ProblemSolving #TechLearning "6 down, 244 to go. The dashboard might show 6/250, but the real progress is in the 'Medium' difficulty milestone I hit today and the logic I've mastered behind the scenes."
Group Anagrams with Frequency Array Strategy
More Relevant Posts
-
Day 2: Mastering the Architecture of Data – Python Data Structures! 🏗️ for Gen AI Revision After laying the foundation yesterday, Day 2 was all about the building blocks. In Gen AI development, how you store and manipulate data (tokens, embeddings, prompts) defines the efficiency of your model. Today was a deep dive into Python Data Structures. It’s not just about knowing list or dict; it’s about knowing why and where to use them for memory efficiency and speed. 🧠 What I Mastered Today: Strings & Immutability: Deep dive into slicing, advanced formatting (f-strings), and understanding why strings are immutable—a key concept when handling large text datasets for LLMs. Lists & Tuples: Beyond basic indexing. Focused on list comprehensions for clean code and using tuples for data integrity (immutable sequences). Sets for Performance: Leveraging hash-based lookups for unique element extraction and mathematical set operations (union/intersection)—crucial for data preprocessing. Dictionaries (The Powerhouse): Building efficient word frequency counters and nested structures. Understanding O(1) complexity for fast data retrieval. I didn't just read theory; I solved 15+ mini-problems ranging from character frequency analysis to complex list flattening—all without using external libraries to keep the logic raw and sharp. 💻 GitHub Progress: I’ve pushed the practice.py file with all 15+ solved challenges to my repo: day02_data_structures/ 🔗 https://lnkd.in/gikzc-K8 The journey to an MNC as a Gen AI dev is about consistency. Two days down, 88 to go. 🚀 #Python #DataStructures #GenAI #GenerativeAI #100DaysOfCode #AIDevelopment #TechJourney #MNCGoal #RevisionSeries #BackendDevelopment
To view or add a comment, sign in
-
𝐒𝐭𝐨𝐩 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐌𝐨𝐝𝐞𝐥𝐬 𝐔𝐧𝐭𝐢𝐥 𝐘𝐨𝐮 𝐃𝐨 𝐓𝐡𝐢𝐬 𝐅𝐢𝐫𝐬𝐭. Your ML results don’t start with algorithms - they start with clean, model-ready data. 🚀 Here’s a simple 𝗗𝗮𝘁𝗮 𝗣𝗿𝗲-𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴 checklist you can follow every time 👇 𝟭) 𝗜𝗺𝗽𝗼𝗿𝘁 𝘁𝗵𝗲 𝗟𝗶𝗯𝗿𝗮𝗿𝗶𝗲𝘀 📚 Bring in the basics: ✅ NumPy | ✅ Pandas | ✅ (Optional) Matplotlib/Seaborn | ✅ Scikit-learn 𝟮) 𝗜𝗺𝗽𝗼𝗿𝘁 𝘁𝗵𝗲 𝗗𝗮𝘁𝗮𝘀𝗲𝘁 🗂️ Load your data and do quick checks: 🔍 shape, column types, sample rows, basic stats 𝟯) 𝗛𝗮𝗻𝗱𝗹𝗲 𝗠𝗶𝘀𝘀𝗶𝗻𝗴 𝗗𝗮𝘁𝗮 🧩 (𝗜𝗺𝗽𝘂𝘁𝗲𝗿) Missing values can silently hurt accuracy. Fix them with: 📌 Mean/Median (numerical) 📌 Mode (categorical) 𝟰) 𝗘𝗻𝗰𝗼𝗱𝗲 𝗖𝗮𝘁𝗲𝗴𝗼𝗿𝗶𝗰𝗮𝗹 𝗗𝗮𝘁𝗮 🔤➡️🔢 Models need numbers, not text. ✅ 𝗜𝗻𝗱𝗲𝗽𝗲𝗻𝗱𝗲𝗻𝘁 𝗩𝗮𝗿𝗶𝗮𝗯𝗹𝗲𝘀 (𝗫): 𝗢𝗻𝗲-𝗛𝗼𝘁 𝗘𝗻𝗰𝗼𝗱𝗶𝗻𝗴 🧱 Example: City → City_NY, City_LA, City_SF ✅ 𝗗𝗲𝗽𝗲𝗻𝗱𝗲𝗻𝘁 𝗩𝗮𝗿𝗶𝗮𝗯𝗹𝗲 (𝘆): 𝗟𝗮𝗯𝗲𝗹 𝗘𝗻𝗰𝗼𝗱𝗶𝗻𝗴 🎯 Example: Yes/No → 1/0 𝟱) 𝗦𝗽𝗹𝗶𝘁 𝗧𝗿𝗮𝗶𝗻 𝘃𝘀 𝗧𝗲𝘀𝘁 ✂️ Common split: 𝟴𝟬/𝟮𝟬 or 𝟳𝟬/𝟯𝟬 🎯 Train = learn patterns | Test = validate performance 𝟲) 𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 ⚖️ Helps models learn fairly when features have different ranges. 📍 Standardization (Z-score) 📍 Normalization (Min-Max) 🔥 Especially important for: 𝗞𝗡𝗡, 𝗦𝗩𝗠, 𝗞-𝗠𝗲𝗮𝗻𝘀, 𝗟𝗼𝗴𝗶𝘀𝘁𝗶𝗰 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 #MachineLearning #DataScience #FeatureEngineering #DataPreprocessing #Python
To view or add a comment, sign in
-
🌐 Most people work with datasets… But where does the data actually come from? One of the most interesting things I explored recently was web scraping collecting data directly from websites instead of relying on pre-built datasets. 💡 What I realized: Real-world data is rarely clean or readily available. Before any analysis or AI model, the first step is often: → Extracting the data → Structuring it properly → Handling inconsistencies 🔧 In this project, I worked on: • Extracting data from web pages • Parsing and cleaning raw HTML content • Converting unstructured data into usable format • Preparing data for analysis 💡 Key takeaway: Data collection itself is a major part of the pipeline and sometimes more challenging than the analysis. This gave me a better understanding of how data pipelines actually begin. I’ve shared the project here: 👉 https://lnkd.in/eRzXNgsZ Curious to hear: 💬 Have you ever worked on collecting your own dataset instead of using ready-made data? #WebScraping #Python #DataEngineering #DataCollection #DataScience #BuildInPublic
To view or add a comment, sign in
-
-
🚀 Day 26/100 — Mastering NumPy for Data Analysis 🧠📊 Today I explored NumPy, the foundation of numerical computing in Python and a must-know for data analysts. 📊 What I learned today: 🔹 NumPy Arrays → Faster than Python lists 🔹 Array Operations → Mathematical computations 🔹 Indexing & Slicing → Access specific data 🔹 Broadcasting → Perform operations efficiently 🔹 Basic Statistics → mean, median, standard deviation 💻 Skills I practiced: ✔ Creating arrays using np.array() ✔ Performing vectorized operations ✔ Reshaping arrays ✔ Applying statistical functions 📌 Example Code: import numpy as np # Create array arr = np.array([10, 20, 30, 40, 50]) # Basic operations print(arr * 2) # Mean value print(np.mean(arr)) # Reshape matrix = arr.reshape(5, 1) print(matrix) 📊 Key Learnings: 💡 NumPy is faster and more efficient than lists 💡 Vectorization = No need for loops 💡 Used as a base for Pandas, ML, and AI 🔥 Example Insight: 👉 “Calculated average sales and transformed dataset efficiently using NumPy arrays” 🚀 Why this matters: NumPy is used in: ✔ Data preprocessing ✔ Machine Learning models ✔ Scientific computing 🔥 Pro Tip: 👉 Learn these next: np.linspace() np.random() np.where() ➡️ Frequently used in real-world projects 📊 Tools Used: Python | NumPy ✅ Day 26 complete. 👉 Quick question: Do you find NumPy easier than Pandas or more confusing? #Day26 #100DaysOfData #Python #NumPy #DataAnalysis #MachineLearning #LearningInPublic #CareerGrowth #JobReady #SingaporeJobs
To view or add a comment, sign in
-
-
🚀 Feature Scaling & Transformation — With Real Example + Code Most people jump to models… but ignore feature scaling, which can literally make or break performance. 💡 Real-World Example Building a House Price Prediction Model 🏡 Features: - Size = 2000 sq.ft - Rooms = 3 👉 Without scaling → model gives more importance to size ❌ 👉 With scaling → fair contribution from both ✅ 🔥 Types of Scaling 📌 Min-Max Scaling (0–1 range) 📌 Standardization (mean = 0, std = 1) 📌 Robust Scaling (handles outliers) 📌 Normalization (unit vector scaling) 💻 Quick Python Code (Scikit-Learn) from sklearn.preprocessing import MinMaxScaler, StandardScaler data = [[2000, 3], [1500, 2], [1800, 4]] # Min-Max Scaling minmax = MinMaxScaler() scaled_minmax = minmax.fit_transform(data) # Standard Scaling standard = StandardScaler() scaled_standard = standard.fit_transform(data) print("MinMax:\n", scaled_minmax) print("Standard:\n", scaled_standard) 🔧 Feature Transformation ✔️ Log Transform → handles skewed data (e.g., salary) ✔️ Encoding → converts categories into numbers ⚠️ Pro Tip Always scale after train-test split to avoid data leakage. ✨ Final Thought Better data > Better model. #DataScience #MachineLearning #FeatureEngineering #Python #AI #Learning
To view or add a comment, sign in
-
Announcing Cvxium, a new python package for writing fast convex optimization solvers! (I pronounce "Cvxium" as "Calcium" but you do you.) Taking Stephen Boyd's 2 course sequence on convex optimization was the highlight of my time at Stanford. Professor Boyd taught us that off-the-shelf solvers can be a good starting point, but to really get a good solver, you should tailor it to the problem at hand. Over my time as a data scientist, I've written the same basic custom Interior Point Method over and over. When I created PyRake last year (a library for calibrating and applying weighting estimators), I took the opportunity to take everything I had learned about writing fast solvers and packaging it into a general framework. This week I spun that framework into its own package, and Cvxium was born. Cvxium is a framework, not a library. A framework forces you to do things a certain way, while a library does something for you. A library is a gift, and a framework is an obligation. Cvxium makes you write out the special structure of your problem, and that burden means most people are not interested in this approach. But Claude will happily do this for you! Cvxium includes a USAGE.md file that teaches AI agents (or humans) how to translate their problem into a fast solver. Cvxium is available in PyPI; install it as you would any other package (eg with pip or uv). I'll post the github repo in the comments.
To view or add a comment, sign in
-
📊 NumPy Cheat Sheet – Must Know for Data Science If you're learning Python for Data Science / Machine Learning, mastering NumPy is non-negotiable. Here’s a quick revision guide 👇 🔍 Core Concepts: 🧱 Array Creation • np.array() • np.arange() • np.linspace() • np.zeros() / np.ones() 🔄 Array Operations • Reshape & Flatten • Indexing & Slicing • Concatenation & Splitting 📐 Mathematical Operations • np.mean() • np.sum() • np.std() • Dot Product (np.dot()) ⚡ Broadcasting & Vectorization • Perform operations without loops • Faster computation 🚀 🎲 Random Module • np.random.rand() • np.random.randint() • np.random.normal() 📊 Linear Algebra • Matrix Multiplication • Determinant & Inverse • Eigenvalues & Eigenvectors 💡 Key Takeaways: ✔ NumPy = Backbone of ML & Data Science ✔ Vectorization improves performance drastically ✔ Essential for libraries like Pandas, Scikit-learn, TensorFlow 🎯 Perfect for interview prep + quick revision #NumPy #Python #DataScience #MachineLearning #AI #Coding #LearnPython #Tech
To view or add a comment, sign in
-
-
Ever run a Python script and get a frustrating “file not found” error? 😤 This simple snippet can save you hours 👇 import os # Check if we're in the right place print("Current directory: ", os.getcwd()) # Check if our data file exists data_path = "data/sales.csv" if os.path.exists(data_path): print(f"Found {data_path}") else: print(f"X Cannot find {data_path}") print("Make sure you're running from the sales-analysis folder!") 💡 What’s happening here? 🔹 os.getcwd() Prints your current working directory — this tells you where your script is running from. Many errors happen because you're in the wrong folder. 🔹 data_path = "data/sales.csv" Defines the relative path to your dataset. 🔹 os.path.exists(data_path) Checks if the file actually exists before trying to use it. 🔹 Conditional check (if / else) Gives clear feedback: ✔ Found the file ❌ Or tells you it’s missing 🚀 Why this matters Prevents runtime errors Helps debug file path issues quickly Makes your scripts more reliable Essential habit for data analysis projects 📊 Whether you're working on data science, automation, or AI — always verify your file paths before processing data. Small habit. Big impact. #Python #Programming #DataScience #AI #CodingTips #Debugging
To view or add a comment, sign in
-
📊 Beyond the Bell Curve: Handling "Messy" Data in Python As data scientists, we often dream of perfect, Gaussian (normal) distributions. But in the real world—especially with variables like car prices or housing data—the data is rarely "normal." I recently worked through a project involving Left-Skewed and Non-Parametric data. Here’s a breakdown of how I handled it using Python: 1️⃣ Identifying the Shape Before running any tests, I used Matplotlib to visualize the distribution. A high bin count (150) helped reveal a significant Left Skew, where the mean was being pulled down by a long tail of lower-priced entries. Python plt.hist(prices, bins=150) plt.show(); 2️⃣ The Transformation Strategy When data is left-skewed, standard parametric tests (like T-Tests) can become biased. To pull that "tail" back toward the center and achieve symmetry, I explored Square ($x^2$) and Cube ($x^3$) transformations. By stretching the right side of the distribution more than the left, these mathematical shifts can often "normalize" the data, allowing for more powerful statistical modeling. 3️⃣ When to Stay Non-Parametric If the data is truly "Non-Parametric" (multimodal or containing extreme gaps), forcing a transformation isn't the answer. In those cases, I pivot to Rank-Based tests like: ✅ Mann-Whitney U (instead of T-Test) ✅ Kruskal-Wallis (instead of ANOVA) ✅ Spearman’s Rank (instead of Pearson Correlation) The takeaway: Don't just import your library and hit "run." Understanding the geometry of your data is the difference between a biased model and an accurate insight. 💡 #DataScience #Python #Statistics #MachineLearning #Pandas #DataAnalytics #DataIntegrity
To view or add a comment, sign in
-
-
🚀 Learn with Soumava | Series 01: Mastering the Foundation of AI with NumPy 📊 Beyond the Loop: Why NumPy is a Game-Changer for ETL & AI As an ETL professional transitioning deeper into AI and Data Science, I’ve realized that the biggest "productivity unlock" isn't just knowing Python—it’s mastering NumPy. In traditional testing, we often rely on row-by-row logic. However, in the world of High-Volume Data and AI, efficiency is everything. Using NumPy’s Vectorized Operations, we can process millions of data points 50x to 100x faster than standard Python lists. I’ve put together a Hands-on Google Colab Notebook that covers the essentials: 🔹 The "Axis" Secret: How to calculate means and sums across rows vs. columns (Axis 0 vs. Axis 1). 🔹 Boolean Masking: Filtering millions of rows of data without a single if statement. 🔹 Broadcasting: Performing complex math across different array shapes automatically. 🔹 Statistical Aggregates: Using std, median, and mean to detect data drift and outliers. Check out the full walkthrough in the document below! What’s your go-to NumPy trick for data validation? Let’s discuss in the comments. #Python #NumPy #DataEngineering #ETLTesting #AI #DataScience ##MachineLearning #TechLearning
To view or add a comment, sign in
Explore related topics
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development