Nitin Singh’s Post

5mo

🚀 DSA Challenge – Day 93 Problem: Find Median from Data Stream 📊⚙️ This was an exciting deep dive into data structures and real-time computation — maintaining the median efficiently while continuously adding numbers from a stream. 🧠 Problem Summary: You need to design a class MedianFinder that can: ✅ Add numbers dynamically from a data stream. ✅ Return the median at any point in time. If the total number of elements is even → median = mean of the two middle values. If odd → median = middle value. ⚙️ My Approach: 1️⃣ Use two heaps to maintain balance: A max heap (maxHeap) for the smaller half of numbers. A min heap (minHeap) for the larger half. 2️⃣ Whenever a new number arrives: Push it into the maxHeap (inverted to simulate max behavior). Balance both heaps so that their size difference is never more than 1. 3️⃣ Ensure heap order: the maximum in maxHeap ≤ minimum in minHeap. 4️⃣ The median is: The top of the larger heap (if odd count). The average of both tops (if even count). 📈 Complexity: Time: O(log n) → For insertion and heap balancing. Space: O(n) → To store all elements in heaps. ✨ Key Takeaway: This problem highlights how heaps can turn complex real-time median calculations into a smooth, efficient process — a great example of data structure synergy in action. ⚡ 🔖 #DSA #100DaysOfCode #LeetCode #ProblemSolving #Heaps #PriorityQueue #DataStructures #Algorithms #Median #Python #CodingChallenge #InterviewPrep #EfficientCode #DynamicProgramming #TechCommunity #LearningByBuilding #CodeEveryday

To view or add a comment, sign in

More Relevant Posts

Onu Joy
6mo
Report this post
Data Structure and Algorithm: Array👩🏾💻 I’ve been using arrays for a while, but now I’m actually starting to understand how they work in memory and how their time complexity really makes sense. An array isn’t just a bunch of items stored randomly. It’s actually a continuous block of memory where all the elements sit side by side. Because of that, the computer already knows exactly where each element is stored, which is why accessing elements is really fast. For example, if you want to get the 5th element, the computer doesn’t need to go through everything one by one. It just calculates the exact position using the memory address. That’s why accessing an element is O(1) which means constant time. But inserting or deleting something in between is slower O(n) because other elements may need to shift. There are mainly two types of arrays 1. One dimensional array 2. Multi dimensional array A one dimensional array is like a straight line of elements. Think of it as a simple list like [10, 20, 30, 40]. Each element has an index 0, 1, 2, 3 which makes accessing any element easy and fast. A multi dimensional array on the other hand has more than one level like a table 2D or a cube 3D. A two dimensional array feels like rows and columns in a spreadsheet. A three dimensional array is like stacking multiple tables on top of each other, imagine a cube of data. One thing that really stood out to me is that arrays are static in size which means once you create them, you can’t easily change their size. This is also why Python lists are more flexible, they’re built on top of arrays but can grow or shrink dynamically. Understanding how time and space complexity works made me realize how powerful arrays actually are Accessing an element → O(1) Searching → O(n) Insertion or Deletion → O(n) Traversing all elements → O(n) I attached an image of examples of the different types of array below That's all for now, bye ☺️❤️ #TechJourney #PythonLearning #TechCommunity #Array #DataStructure #DSA #Python #Programming #Algorithm
Like Comment
To view or add a comment, sign in
Nitin Singh
5mo
Report this post
🚀 DSA Challenge – Day 94 Problem: Design LRU (Least Recently Used) Cache ⚡📦 This was one of those classic data structure design problems that truly tests your understanding of hash maps and linked lists working together seamlessly for O(1) performance! 🧠 Problem Summary: Implement a data structure that behaves like an LRU Cache, supporting: ✅ get(key) → Retrieve value in O(1). ✅ put(key, value) → Insert/update value in O(1). ✅ Automatically evict the least recently used key when capacity is exceeded. ⚙️ My Approach: To achieve O(1) operations, I combined two powerful structures: 1️⃣ HashMap (keyMap) → For constant-time key lookups. 2️⃣ Doubly Linked List → To maintain the order of usage (most recent at the front). 🔹 When a key is accessed or updated: Move it to the front (most recent). 🔹 When inserting a new key: If full → remove the least recently used node (from the tail). Insert the new key-value pair at the front. The linked list allows O(1) addition/removal, and the hashmap ensures O(1) lookup and update. 📈 Complexity: Time: O(1) → For both get and put. Space: O(capacity) → For the hashmap and list nodes. ✨ Key Takeaway: This problem elegantly demonstrates how data structures complement each other — the linked list maintains order, and the hashmap ensures constant-time access. A perfect synergy of logic and structure! ⚙️💡 🔖 #DSA #100DaysOfCode #LeetCode #ProblemSolving #LRUCache #DataStructures #HashMap #LinkedList #Python #Algorithms #SystemDesign #CodingChallenge #EfficientCode #InterviewPrep #TechCommunity #CodeEveryday #LearningByBuilding
Like Comment
To view or add a comment, sign in
Yohan Babu Morla
6mo Edited
Report this post
🚀 Days 41–56 of #100DaysOfCoding: Strengthening Problem-Solving with Arrays & NumPy Over the past 16 days, I’ve focused on building a solid foundation in data structures and numerical computing. This phase has been about improving efficiency, mastering algorithmic thinking, and understanding data manipulation at a deeper level. 🔹 Core Array Problems Solved: 1️⃣ Find Min/Max Elements – Implemented an O(n) linear-time approach without sorting, optimizing both time and space 2️⃣ In-Place Array Reversal – Applied the two-pointer technique to reverse arrays efficiently (O(1) space) 3️⃣ Element Frequency Counter – Designed a function to compute occurrences of target elements in linear time 4️⃣ Second Largest Element – Solved using two tracking variables in a single traversal 5️⃣ Move Zeros to End – Implemented a stable version maintaining element order; currently refining with a two-pointer optimization 🔹 NumPy Fundamentals: Explored essential NumPy operations for data analysis, including: Mean, Median, Standard Deviation, Variance Array slicing, broadcasting, and vectorized computations These are fundamental tools for upcoming machine learning and data science projects. 🔹 Key Learnings: ✅ Optimization in both time and space complexity is critical ✅ In-place algorithms significantly enhance memory efficiency ✅ Clean, simple solutions often outperform over-engineered ones Next steps: optimizing current implementations and diving deeper into advanced data structures and algorithms. GitHub Repository: https://lnkd.in/gsucUW-F What’s your favorite array-related problem or concept I’d love to hear your thoughts in the comments 👇 #Python #NumPy #DataStructures #Algorithms #ProblemSolving #CodingChallenge #LearningInPublic #TechJourney #100DaysOfCode #yohancodes #selflearnig
Like Comment
To view or add a comment, sign in
Sumit Kumar
5mo
Report this post
All our work so far has been on a single piece of data. This is a bottleneck. Today, we scale. #ZeroToFullStackAI Day 8/135: The First Data Structure (The List). We've established our foundation (Primitives, Logic, Error Handling) on singular variables. To build real applications, we must work with collections of data—thousands of prices, millions of user IDs, or a sequence of sensor readings. Today, we build our first and most fundamental data structure: the Python List. A List is not just a container; it has three specific properties: It's a Collection: It holds multiple items in a single variable. It's Ordered: Every item has a specific position (index), which means we can access any item by its number. It's Mutable: It is "changeable." We can add, remove, and modify items after the list has been created. This is the shift from price to prices. We've built our data container. But a container is useless without an engine to process what's inside. Tomorrow, we build that engine: The for Loop. #Python #DataScience #SoftwareEngineering #AI #Developer #DataStructures
Like Comment
To view or add a comment, sign in
BARIS KAHRAMAN
6mo Edited
Report this post
How Adding sort=False Made My Pandas Code 3x Faster Just wrapped up the second phase of optimizing our data pipeline. After last week's vectorization work (20x speedup), I found another bottleneck hiding in plain sight. The Problem: Pandas groupby operations were spending 60% of their time sorting results that we never needed sorted. The Fix: One parameter. # Before (slow) df.groupby('cycle')['value'].min() # After (fast) df.groupby('cycle', sort=False)['value'].min() Results: GroupBy operations: 2-3x faster Delta calculations: 4.3x faster Overall aggregation: 2-4x faster Combined with vectorization: 60x total speedup from baseline! Key Takeaways: Default ≠ Optimal: Pandas sorts by default. Most use cases don't need it. Use .values for math: df['a'].values - df['b'].values is 2-5x faster than df['a'] - df['b'] Profile first: Without profiling, I'd never have suspected sorting was the bottleneck. Small changes may cause a huge impact: 15 lines of code. 2-4x speedup. Faster iteration, earlier insights Currently exploring Numba and Polars for the next phase. What's your favorite one-line performance boost? #Python #Pandas #NumPy #Performance #DataEngineering
Like Comment
To view or add a comment, sign in
Abbas Mansour Leila
5mo
Report this post
Mean imputation is a straightforward method for handling missing values in numerical data, but it can significantly distort the relationships between variables. By replacing missing values with the mean of observed data, this approach artificially reduces variability and weakens correlations, leading to misleading results in analysis. Why Does Mean Imputation Distort Correlations? ❌ No variability in imputed values: Mean imputation assigns the same value to all missing entries, failing to reflect the natural variability of the data. ❌ Weakens relationships: The imputed values introduce artificial uniformity that diminishes or masks the strength of correlations between variables. ❌ Biased downstream analyses: Statistical tests and predictive models relying on the data's correlation structure may produce inaccurate or unreliable results. A Visual Example: The attached image demonstrates how mean imputation can disrupt correlations between variables. The black points represent the original observed values, showing the natural relationship between variables X1 and X2. The red and green points represent imputed values for X1 and X2, respectively, placed at their mean values. This disrupts the overall pattern, artificially aligning the data along the mean and weakening the true correlation between X1 and X2. A Better Approach: To preserve relationships between variables, predictive mean matching is a superior alternative. This method selects observed values closest to the predicted value for a missing entry, maintaining variability and the natural correlation structure. When combined with multiple imputation, it also accounts for uncertainty, ensuring more robust and reliable results for downstream analyses. For a detailed explanation of mean imputation, its drawbacks, and better alternatives, check out my full tutorial here: https://lnkd.in/d2vfiSmf Sign up for my free email newsletter to stay informed about data science, statistics, Python, and R. More info: http://eepurl.com/gH6myT #RStats #Data #datasciencetraining #Python #StatisticalAnalysis #DataAnalytics
Like Comment
To view or add a comment, sign in
Bella Apries

Ex Software Engineer Intern at PT. Kalbe Farma, Tbk | Front-End Developer | Data-Analyst | Next.JS • TypeScript • PostgreSQL | Passionate about building impactful digital products
5mo
Report this post
Day 13 – Turning Messy Data into Meaningful Insights 🧹📊 Today was all about cleaning — not my room, but my dataset 😆 I dived into data cleaning and preparation using Pandas, one of the most crucial (yet often underrated) parts of any data analysis workflow. It’s the stage where raw, chaotic data finally starts to make sense. I learned how to detect and handle missing values, drop duplicates, fix inconsistent types, and even rename columns for better readability. It’s amazing how much clarity comes from just cleaning things up, suddenly trends and patterns begin to appear. I’m still working in Google Colab, and the more I explore, the more I realize how powerful it is for experimenting and visualizing data transformations quickly. Every line of code today reminded me that good insights always start with good data. 🧠 #Day13 #Python #Pandas #DataCleaning #DataPreparation #DataAnalytics #LearningJourney #AIChallenge
Like Comment
To view or add a comment, sign in
Joachim Schork
5mo
Report this post
When building predictive models, overfitting is a common challenge. Shrinkage methods, such as Ridge Regression, Lasso, and Elastic Net, help address this by adding a penalty term to the objective function during training, which discourages large coefficients. This results in more robust models that generalize better to new data. ✔️ Ridge Regression shrinks coefficients by penalizing their squared values, making it great when all features matter. ✔️ Lasso forces some coefficients to zero, effectively performing feature selection, ideal when only a subset of features is important. ✔️ Elastic Net combines the strengths of Ridge and Lasso, providing a balance between regularization and feature selection, especially useful when features are correlated. However, there are some challenges to consider: ❌ Loss of interpretability: Excessive shrinkage can make it difficult to interpret the model coefficients, as important predictors may have their effects reduced. ❌ Tuning required: These methods require careful tuning of hyperparameters (like λ and α) to find the right balance between bias and variance. Poor tuning can lead to either underfitting or overfitting. ❌ Not suitable for all situations: In some cases, simpler models like OLS (Ordinary Least Squares) might perform just as well or even better, especially when the sample size is large and multicollinearity isn’t an issue. 🔹 In R: Use the glmnet package to apply Ridge, Lasso, and Elastic Net. 🔹 In Python: Leverage the sklearn.linear_model module for all three shrinkage methods. Want to dive deeper into these methods and learn how to apply them? Join my online course on Statistical Methods in R, where we explore this and other key techniques in further detail. Take a look here for more details: https://lnkd.in/d-UAgcYf #datascience #pythonforbeginners #analysis #package
11 Comments
Like Comment
To view or add a comment, sign in

1,494 followers

217 Posts

View Profile Connect

Nitin Singh’s Post

More Relevant Posts

Explore content categories