Timsort: The Adaptive Hybrid Sort

1mo

🚀 𝗗𝗮𝘆 𝟭𝟰/𝟯𝟬: 𝗧𝗵𝗲 𝗔𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺𝗶𝗰 '𝗖𝗵𝗲𝗮𝘁 𝗖𝗼𝗱𝗲' (𝗧𝗶𝗺𝘀𝗼𝗿𝘁) Two weeks down! Halfway through my #30DaysOfCode challenge. ⚡ We’ve seen the "Turtles" (O(n^2)), the "Rockets" (O(n \log n)), and the "Math Masters" (O(n)). But when you run .sort() in Python, Java, or Swift, which one does the computer actually pick? The answer: None of them. It uses a Hybrid Sort called Timsort. 💡 𝗪𝗵𝘆 𝗰𝗼𝗺𝗯𝗶𝗻𝗲 𝗮𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺𝘀? There is no "perfect" algorithm: Insertion Sort (O(n^2)): Lightning fast for tiny datasets (< 64 items) and "Adaptive" (finishes O(n) if data is already sorted). Merge Sort (O(n \log n)): A beast for massive data, but heavy on memory and complex for small tasks. 1. The Cheat Code: Dynamic Selection 🧠 Timsort is the ultimate pragmatist. It analyzes your data at runtime: Identify "Runs": It scans the array for naturally sorted chunks. Sort Small: If a chunk is small, it uses Insertion Sort for instant, low-overhead results. Merge Big: It then uses Merge Sort to "zip" these sorted chunks together into one final, stable O(n \log n) result. ✅ 𝗪𝗵𝗮𝘁 𝗜 𝘁𝗮𝗰𝗸𝗹𝗲𝗱 𝘁𝗼𝗱𝗮𝘆: Synergy Analysis: Why Merge Sort’s stability and Insertion Sort’s speed on small data are the "Dream Team." Adaptive Power: How Timsort approaches O(n) linear speed on real-world, partially sorted data. Stability: Why preserving the order of duplicate items is mandatory for production-grade software. 🤖 𝗧𝗵𝗲 𝗔𝗜 𝗖𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻: This "Adaptive Synthesis" is key to LLMs. A coherent response depends on maintaining Sequential Context. Just as Timsort preserves order, AI must preserve the relationship between words to make sense. ⚡ 𝗣𝗿𝗼𝗴𝗿𝗲𝘀𝘀: 𝟭𝟰/𝟯𝟬 The engines are mastered. Tomorrow, we move from how we process data to where we store it: Data Structures! 𝗤𝘂𝗲𝘀𝘁𝗶𝗼𝗻: Timsort is robust but needs extra memory (O(n) Space). Can you name an adaptive hybrid sort that is "In-Place"? (Hint: Go 1.19 uses it!) 👇 #30DaysOfCode #Algorithms #Timsort #HybridSorting #BigO #SoftwareEngineering #GoLang #Java #PHP #Day14 #BackendDevelopment

To view or add a comment, sign in

More Relevant Posts

Prince Jerry Essien
1w
Report this post
Stop fighting the Borrow Checker: The Rust Iterator & closures 🦀 We’ve all been there. you’re writing what should be a simple data transformation in Rust, and suddenly the compiler starts yelling about Iter, Item, and Sized types. If you’re coming from Python or JS, Rust’s functional patterns feel familiar—until they don't. Here are the 3 most common pitfalls I see developers hit when combining Vectors and Closures, and how to fix them. 1. The "Iterator is not a Vector" Type TrapThe Mistake:let logic: Vec<i32> = my_vec.iter().map(|x| x * 2);The Reality:In Rust, an Iterator is a lazy description of work, not the data itself. .map() doesn't actually do anything until you consume it.The Fix: You must append .collect() to "solidify" those transformations back into a Vector. 2.) .iter() vs .into_iter() (The Ownership Ghost) This is the one that trips up everyone.Use .iter() if you want to keep your original Vector alive. It yields references (&T).Use .into_iter() if you’re done with the original Vector. It consumes the collection and yields owned values (T).Pro-Tip: If you use .into_iter() then try to println!("{:?}", my_vec) later, the compiler will (rightfully) tell you the value has moved. Rust is protecting you from a "use-after-free" bug before it even happens. 3. The "Hidden" Dereferencing in ClosuresWhen you use .iter(), your closure parameter (let’s call it |m|) is actually a reference.Why does m * 3 work if m is a reference?Because for primitive types like i32, Rust performs copy-semantics. It’s smart enough to see you want the value inside the reference. But if you’re working with complex Structs, you’ll need to explicitly handle the reference or use move closures to capture the environment The Golden Rule for Rustaceans: 1. Vector = The Box. 2. Iterator = The Conveyor Belt. 3. Closure = The Robot modifying items on the belt. 4. Collect = The New Box at the end.Rust isn't being difficult; it's being precise. Once you respect the ownership of the data on the "conveyor belt," the language becomes a superpower rather than a struggle.#RustLang #Programming #SoftwareEngineering #CodingTips #SystemsProgramming
Like Comment
To view or add a comment, sign in
Adarsh Kumar
2w Edited
Report this post
🚀 Day 29/30 – DSA Challenge 📌 LeetCode Problem – Shortest Distance to Target String in a Circular Array 📝 Problem Statement You are given: A circular array of strings words[] A target string target A starting index startIndex Return the minimum distance required to reach any occurrence of target. 👉 You can move left or right in circular manner. 📌 Example Input: words = ["hello","i","am","leetcode","hello"] target = "hello" startIndex = 1 Output: 1 💡 Key Insight Since the array is circular, distance is: min(|i - startIndex|, n - |i - startIndex|) 👉 Either go directly 👉 Or wrap around 🔥 Optimal Approach 🧠 Idea Loop through all indices Check where words[i] == target Calculate circular distance Keep minimum 🚀 Algorithm 1️⃣ Initialize minDist = ∞ 2️⃣ For each index i: If match found Compute: dist = Math.min(Math.abs(i - startIndex), n - Math.abs(i - startIndex)); 3️⃣ Update minimum 4️⃣ If no match → return -1 ✅ Java Code (Optimal O(n)) class Solution { public int closestTarget(String[] words, String target, int startIndex) { int n = words.length; int minDist = Integer.MAX_VALUE; for (int i = 0; i < n; i++) { if (words[i].equals(target)) { int diff = Math.abs(i - startIndex); int dist = Math.min(diff, n - diff); minDist = Math.min(minDist, dist); } } return minDist == Integer.MAX_VALUE ? -1 : minDist; } } ⏱ Complexity Time Complexity: O(n) Space Complexity: O(1) 📚 Key Learnings – Day 29 ✔ Circular array problems need special distance handling ✔ Always consider wrap-around cases ✔ Math-based optimization simplifies logic ✔ String comparison using .equals() is important Circular thinking. Simple math. Clean solution. Day 29 completed. Consistency continues 💪🔥 #30DaysOfCode #DSA #Java #InterviewPreparation #ProblemSolving #CodingJourney #Arrays #LeetCode
Like Comment
To view or add a comment, sign in
Ollayor Sabirov
1w
Report this post
Most developers use algorithms every day. Very few can explain why one is faster than another. That is where Big O Notation comes in. It is not just interview prep. It is how you think about performance before it becomes a problem in production. Here is your complete Big O cheat sheet, broken down simply: TIME COMPLEXITIES 1. O(1) — Constant Same execution time regardless of input size. HashMap.get(), array index access. This is the holy grail of performance. Always aim for it when possible. 2. O(log n) — Logarithmic Halves the problem with every step. Binary search, balanced BST lookup. 1 million items? Just 20 steps. Incredibly powerful at scale. 3. O(n) — Linear Execution time grows proportionally with input. Iterating an array, linear search. Acceptable for most use cases and very readable code. 4. O(n log n) — Linearithmic The best possible complexity for comparison-based sorting. Merge sort, quicksort average case, Arrays.sort() in Java. This is the sorting sweet spot. 5. O(n²) — Quadratic Nested loops over the same data. Bubble sort, comparing every pair. Works fine for small datasets but kills performance above 10K items. Avoid at scale. 6. O(2ⁿ) — Exponential Doubles with every additional input. Recursive Fibonacci without memoization, brute-force subsets. Practically unusable beyond n=30. A billion operations waiting to happen. SPACE AND PATTERNS 7. Space Complexity Not just about speed. How much memory does your algorithm consume? HashMap uses O(n) space. Merge sort needs O(n) for temp arrays. In-place sort uses O(1). Both time and space matter. 8. Recognise Big O in Code - for(i) = O(n) - for(i) for(j) = O(n²) - while(lo < hi), mid = lo+hi/2 = O(log n) - map.get(k) = O(1) Train your eyes to spot complexity before you write the code, not after. 9. Amortised Analysis ArrayList.add() is O(1) amortised, even though it occasionally resizes at O(n). The rare expensive operation is spread across many cheap ones. Do not judge an operation by its worst single case. 10. Best vs Worst vs Average Case Quicksort is O(n log n) average but O(n²) worst case. Big O usually refers to worst case. Knowing the difference is what separates junior developers from senior engineers in technical interviews. Understanding Big O is not about memorising charts. It is about making better decisions every single time you write code. Save this post. You will need it. Which complexity trips you up the most? Let me know in the comments. #BigO #Algorithms #SoftwareEngineering #CodingInterview #DataStructures #Programming #BackendDevelopment #TechCareer #CleanCode #ComputerScience
Like Comment
To view or add a comment, sign in
Rupesh Mane
3d
Report this post
In real-world Machine Learning and Data Science workflows, handling JSON data is a fundamental skill. JSON (JavaScript Object Notation) is a widely used data format because it is lightweight, human-readable, and supported across almost all programming languages. It is commonly used for data exchange between APIs, servers, and web applications. --- 🔹 Working with Local JSON Files JSON data stored locally can be directly loaded into a DataFrame using Pandas: "pd.read_json("train.json")" --- 🔹 Fetching JSON Data from APIs Data can also be fetched from external sources using URLs: "pd.read_json(url)" APIs typically return data in JSON format, making it easy to parse and analyze. --- 🔹 Handling Nested JSON Data In many real-world scenarios, JSON data is nested. To transform it into a structured tabular format, we use: "pd.json_normalize()" --- 🔹 Key Takeaways • JSON is a universal and API-friendly data format • Pandas simplifies reading JSON from both files and URLs • Nested JSON requires normalization for proper analysis • Always explore and understand the data after loading --- Understanding how to work with JSON efficiently is an essential step in building robust data pipelines and ML systems. #MachineLearning #DataScience #Python #Pandas #AI #LearningInPublic #DeepLearning #DataScientist
Like Comment
To view or add a comment, sign in
Akash Sivanandan
4w
Report this post
We need to start caring about data packaging again. I migrated Rahu’s Python AST from a pointer-heavy recursive structure to an arena-backed one, and it improved both analysis and lookup much more than I expected. Rahu is a Python language server I’m building from scratch in Go. The old AST used separate structs, pointers, and slices to model recursive trees. That made it easy to work with, but it also meant many small allocations, pointer chasing, and poor cache locality in hot paths. The new AST is stored as a flat arena: compact nodes in a contiguous slice, stable NodeIDs, sibling-linked children, and side tables for names, strings, and numbers. A good example is attribute access. In the old AST, obj.field was an Attribute node pointing to both the base expression and a separate Name node. In the new one, it’s just a NodeAttribute plus child IDs into the same array. Traversal involves indexed access instead of following heap pointers. The result: AnalysisSmall: ~84 µs → ~55 µs AnalysisMedium: ~183 µs → ~117 µs AnalysisLarge: ~2.15 ms → ~1.85 ms DefinitionLookup: ~205 ns → ~30 ns HoverLookup: ~207 ns → ~34 ns DefinitionLookupAll: ~12.2 µs → ~1.36 µs The geomean across the benchmark set dropped by about 45%. Some construction-heavy paths worsened slightly, which is expected: the arena model added bookkeeping and shifted work into indexing and side tables. The edit-time analysis path improved, and lookup improved significantly, which matters more for the actual LSP experience. The main takeaway for me was simple: data layout matters. I didn’t change the language features. I changed AST storage and traversal, and that had a large effect on end-to-end performance.
1 Comment
Like Comment
To view or add a comment, sign in
Shakeel Javed Khan
1mo Edited
Report this post
#dataengineering A data ingestion problem. 20–30K files. Download. Chunk. Vectorize. Every few hours. Forever. Imagine this. It's 2am. Your pipeline is running — barely. A single Python script, looping through files one by one. Download. Chunk. Vectorize. Wait. Repeat. 25,000 files taking hours. And by the time it finishes, it's almost time to start again. Sound familiar? Here's how you'd think through it — and why the "obvious" answers are often wrong. Lets walk through available options. Option 1 — Serial (single-threaded) Does one thing, finishes, moves on. Simple to write and debug. If it fails, you know exactly where. But for 25K files? You're waiting all night. Fine for a weekend prototype. A disaster in production. Option 2 — Async / Concurrent Send 100 requests before the first one comes back. A right step to take. Python's asyncio let us fire off dozens of downloads simultaneously. I/O-bound work — waiting for HTTP responses — is where async shines. Runtime dropped dramatically. But we're still on one machine, one CPU core. Vectorization is CPU-heavy. Async won't help there. Option 3 — Multi-threaded Put every core to work. ThreadPoolExecutor or multiprocessing let us use all CPU cores for the chunking and embedding work. Combined with async for downloads, this was a real upgrade. But Python's GIL limits true CPU parallelism in threads — you need multiprocessing to escape it. Still a single machine. Still a single point of failure. Option 4 — Apache Spark Distribute the job across a cluster. Spark is extraordinary — when you need it. Petabytes? Millions of files? Yes. 25K files every few hours? You're spending more time on cluster management than the actual work. Spark has high overhead. Don't bring a rocket ship to a road trip. Option 5 — Highly Available Distributed Service A queue. Workers. Retries. Observability. Always on. This is where we landed for production. A task queue (Celery, RQ, or a cloud-native option like Cloud Tasks) pulls jobs off a queue. Workers process independently. Failed jobs retry automatically. New files? Push to queue. Workers scale up. It's more complex to set up than the first 3 options — but it's the only option that handles real-world messiness: flaky APIs, partial failures, midnight spikes. The lesson? Each step wasn't an upgrade in prestige. It was an upgrade in the problem being solved. Serial → Async: you're I/O-bound. Async → Multi-process: you're CPU-bound. Single node → Distributed: you need fault tolerance. Spark → HA service: you need continuous operation, not just scale. Know which problem you actually have before you reach for the hammer.
Like Comment
To view or add a comment, sign in
Navneet Singh
3d
Report this post
I just learned that Web Scraping is 10% coding and 90% problem-solving. I’ve been building a Data Collection repository to house different techniques for my data science projects, and today’s session was a massive eye-opener. Beyond the Script - Building a Resilient Data Acquisition Pipeline Web scraping is more than just fetching a URL it’s about building a pipeline that can handle the unpredictability of the web. Today, I reached a major milestone in my Data Collection Techniques repository. Instead of a basic "one-and-done" script, I implemented a robust Local-First ETL architecture to aggregate 200+ records across multiple pages. The Logic Breakdown - Persistent Ingestion Layer - I designed the system to stage raw HTML snapshots locally before processing. This "snapshot" approach allows for offline development, reduces redundant network load, and ensures I have a verifiable source of truth. Strict Endpoint Validation - To ensure high data fidelity, I implemented logic to validate every server response. By verifying HTTP Status Codes and schema consistency at the point of ingestion, I prevented corrupt or "silent failure" data from ever entering my pipeline. Multi-Source Aggregation - I built a dynamic loop that traverses my local storage, programmatically extracting and cleaning data from 10+ distinct sources into a single, high-fidelity Pandas DataFrame. And the result is - What started as fragmented HTML is now a sanitized, analysis-ready dataset. Data isn’t just found; it’s engineered. Check out the architecture and the code here - https://lnkd.in/dwkezQpE #DataEngineering #WebScraping #Python #Pandas #DataScience #ETL #SoftwareEngineering #AI #AIEngineer
Like Comment
To view or add a comment, sign in
Salah Essioui
3w
Report this post
After cleaning and preparing the dataset, today I made the chatbot talk to a real database. No CSV reading anymore. No in-memory DataFrame queries. The data is now stored in SQLite and accessed using real SQL. What I did today: • Established a connection between Python and SQLite • Converted the cleaned Pandas DataFrame into a SQL table using to_sql() • Designed the table structure directly from the dataset • Ensured data is permanently stored and queryable • Closed the connection properly to avoid database locks Now the system architecture looks like this: User Question → Rule Logic → SQL Query → SQLite Database → Answer This is where the project stops being a script… and starts becoming a real data system. Why this step matters: Because AI systems don’t answer from files. They answer from structured, queryable data sources. The chatbot is now able to answer questions directly from the database, not from Python memory. Next step: Use if / elif logic to map user questions directly to SQL queries and make the chatbot answer real questions from the database. Screenshots from Jupyter Notebook will be shared in the final project. #Python #SQL #SQLite #DataEngineering #AI #MachineLearning

1 Comment
Like Comment
To view or add a comment, sign in
Monir Hossain
6d
Report this post
Data Structures and Algorithms (DSA) are often seen as the "gatekeepers" of top-tier engineering roles, but in 2026, they are more than just interview hurdles, they are the foundation of efficient, scalable system design. Whether you are working with Python, Go, or TypeScript, understanding the underlying patterns of how data is stored and manipulated is what separates a coder from a true problem solver. If you are looking to master DSA this year, here is a structured roadmap to follow: 1. Pick a Language & Master its "Internals" Don’t switch languages midway. Whether it’s Python for its readability or Golang for its performance, understand how it handles memory, lists, and pointers. Knowing how your language manages things under the hood makes implementing DSA much easier. 2. The Foundations: Linear Data Structures Start with the basics until they become second nature: - Arrays & Strings: Master sliding windows and two-pointer techniques. - Linked Lists: Understand memory allocation and node manipulation. - Stacks & Queues: Learn about LIFO/FIFO patterns and their real-world uses in task scheduling or undo features. 3. The Power of Logic: Core Algorithms Once you can store data, you need to process it: - Sorting & Searching: Go beyond basic loops. Understand the efficiency of Merge Sort, Quick Sort, and the power of Binary Search. - Recursion: Master the art of breaking complex problems into smaller, identical sub-problems. This is the gateway to advanced topics. 4. Advanced Non-Linear Structures 🌳 This is where most high-level system design happens: - Trees & Graphs: Understand BFS (Breadth-First Search) and DFS (Depth-First Search). These are essential for everything from social media networks to mapping software. - Heaps & Hashing: Learn how Hash Maps achieve O(1) average time complexity, the backbone of high-performance lookups. 5. The "Senior" Level: Optimization Patterns ⚡ - Dynamic Programming (DP): Learn to identify "overlapping subproblems" and "optimal substructure." It’s about trading space for time (memoization). - Greedy Algorithms: Making the best local choice at each step to find a global optimum. - Backtracking: Essential for solving constraint-based problems like Sudoku or pathfinding. 6. The "Why" over the "What" (Complexity Analysis) A Senior Engineer doesn't just write a solution; they analyze it. You must master Big O Notation. If you can’t explain why your O(n \log n) solution is better than an O(n^2) one for a specific dataset, the code doesn't matter. #DSA #DataStructures #Algorithms #ProblemSolving #SoftwareEngineering #CodingRoadmap #Python #Golang #SystemDesign #TechInterview #ComputerScience #SeniorEngineer #LearningJourney
Like Comment
To view or add a comment, sign in
Bharat Yadav
2d
Report this post
𝗣𝘆𝘁𝗵𝗼𝗻 𝗦𝗲𝗿𝗶𝗲𝘀 𝗗𝗮𝘆 : 𝟱 𝘀𝗵𝗮𝗹𝗹𝗼𝘄 𝗰𝗼𝗽𝘆 𝗯𝗿𝗼𝗸𝗲 𝗺𝘆 𝗽𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 𝗰𝗮𝗰𝗵𝗲 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝘁𝗼𝘂𝗰𝗵𝗶𝗻𝗴 𝘁𝗵𝗲 𝗰𝗼𝗱𝗲 We shipped a small change. Nothing risky. A list was copied, updated, returned. Minutes later, users started seeing each other’s data. Logs were clean. No crashes. Just silent corruption. Restart fixed it… until traffic came back. This happens because Python doesn’t copy data by default. It copies references. So your “new” list still points to the same inner objects. You change one place, everything else changes too. You think you isolated state. You didn’t. You just created another pointer to the same memory. 𝗘𝘅𝗮𝗺𝗽𝗹𝗲: import copy a = [[1], [2]] b = a.copy() b[0].append(99) c = copy.deepcopy(a) c[1].append(77) print(a) print(b) print(c) In backend systems, this hits caching hard. You pull from cache, tweak a nested field, return response. Now cache is mutated. Next request reads corrupted data. No exception, no warning. Just wrong data spreading. 𝗛𝗮𝗿𝗱 𝘁𝗿𝘂𝘁𝗵: if you don’t understand how memory references work, you’re guessing. And guessing in production means you will eventually ship data corruption. 𝗡𝗲𝘅𝘁 𝗧𝗼𝗽𝗶𝗰 : '𝗽𝘆𝘁𝗵𝗼𝗻 𝗺𝗲𝗺𝗼𝗿𝘆 𝗺𝗼𝗱𝗲𝗹'
Like Comment
To view or add a comment, sign in

978 followers

53 Posts

View Profile Follow

Timsort: The Adaptive Hybrid Sort

More Relevant Posts

Explore content categories