Python Regex Engine with Recursive Backtracking

2mo

Just shipped a tiny regex engine in Python from scratch. 🎯 It supports literals, . wildcards, and the * operator using a recursive backtracking matcher, plus a full test suite and docs so you can see exactly how it works under the hood. I also wrapped it in some creative worldbuilding around “Pattern Keys” — forbidden codes that unlock hidden meaning in text — because learning hits different when you mix algorithms with imagination. This project was a deep dive into how regex engines actually work, how backtracking behaves in practice, and how to design clean, readable code instead of “magic.” What’s one algorithm you think every engineer should implement at least once? 👇 https://buff.ly/zY2K4yZ #Python #Regex #Algorithms #Coding #SoftwareEngineering #CreativeCoding #Backtracking

GitHub - KyPython/pattern-keys-regex github.com

1 Comment

KyJahn Smith 2mo

For anyone who wants the technical breakdown + repo, here’s what this tiny regex engine does. Core features matches(pattern, text): Boolean matcher with literals, ., and * using recursive backtracking. find_all_matches(pattern, text): Returns all substrings that match the pattern. Handles edge cases (empty strings, non-ASCII) with a 13-test unittest suite. How it works Recursive matcher that can advance or branch when * appears. * is greedy: it consumes as much as possible, then backtracks to satisfy the rest of the pattern. Implemented in pure Python, no re or external libs. Why I built it To understand regex engines instead of treating them as black boxes. To practice reasoning about backtracking and failure paths. To tie it into a “Pattern Keys” concept: forbidden codes that unlock hidden structure in text. GitHub repo: https://github.com/KyPython/pattern-keys-regex

To view or add a comment, sign in

More Relevant Posts

Su. Ku. Dogra
2mo Edited
Report this post
✍️Compress - Compressor - Compression - Python : '..The trick is to rebuild the compressor every time a new labelled document is received. Thankfully, instantiating a ZstdCompressor with a ZstdDict is very fast – tens of microseconds in my experiments. This makes it affordable to rebuild the compressor very frequently. Here are the steps to take to turn this into a learning algorithm: For each class, maintain a buffer of text that belongs to that class. When a new labelled document is received, append it to the buffer of its class. Rebuild the compressor for that class with the updated buffer. To classify a new document, find the compressor that produces the smallest compressed output for that document. There are several parameters that can be tuned to balance between throughput and correctness: Window size: the maximum number of bytes to keep in the buffer for each class. A smaller window means less data to compress, which means faster compressor rebuilding and compression. But it also means less data to learn from, which ...' - Extract #skdscans #python #compressor #ZstdCompressor #github #maxhalford #classification https://lnkd.in/d-sDt-PK

Text classification with Python 3.14's zstd module maxhalford.github.io
Like Comment
To view or add a comment, sign in
Merna Saqer
2mo
Report this post
Python Deep Dive | Understanding *args Like a Pro One small symbol… big impact. When we use *args in a Python function, where are the values actually stored? Many beginners guess list. Some even think it’s a generic collection. But the correct answer is: tuple ✅ 💡 Why tuple? When you define a function like this: def my_function(*args): print(type(args)) And call it like this: my_function(1, 2, 3) The output will be: <class 'tuple'> Python automatically collects all positional arguments into a tuple. ⸻ 🔍 Why did Python choose tuple instead of list? Because tuples are: • Immutable (cannot be modified) • Memory efficient • Faster than lists • Safer for internal function handling Since *args is meant to collect values, not modify them, immutability makes perfect design sense. ⸻ 🚀 Bonus Insight While: • *args → stores data in a tuple • **kwargs → stores data in a dictionary Understanding this difference is essential for writing clean, scalable, and flexible functions — especially in larger AI or backend systems. Small details like this separate someone who “writes Python” from someone who truly understands Python. #Python #Programming #AI #DataScience #CodingJourney #SoftwareEngineering #30DayChallenge

1 Comment
Like Comment
To view or add a comment, sign in
abdulrahman mahmoud
1mo
Report this post
💬 Discussion Question In Python, if we have a list called "lst1", there are multiple ways to sort the data. Two of the most common approaches are: 1️⃣ "lst1.sort()" 2️⃣ "sorted(lst1)" But what is the difference between them, and when should we use each one? 🔹 "lst1.sort()" - It is a list method. - It sorts the elements in-place, meaning it modifies the original list itself. - It does not return a new list (it returns "None"). Example: lst1 = [4, 2, 7, 1] lst1.sort() print(lst1) Output: [1, 2, 4, 7] 🔹 "sorted(lst1)" - It is a built-in Python function. - It returns a new sorted list without modifying the original one. Example: lst1 = [4, 2, 7, 1] new_list = sorted(lst1) print(lst1) print(new_list) Output: [4, 2, 7, 1] [1, 2, 4, 7] 📌 When to use each one? ✔ Use "sort()" when you want to sort the original list and don’t need the previous order. ✔ Use "sorted()" when you want to keep the original data unchanged and create a new sorted version. 💡 Another advantage of "sorted()" is that it works with many iterable types such as lists, tuples, sets, and dictionaries. #Python #Programming #DataAnalytics #MachineLearning #Coding #30DayChallenge #AI #Instant

1 Comment
Like Comment
To view or add a comment, sign in
Elizabeth Fuentes Leone
1mo
Report this post
Unlock a new way to write AI-powered code: define inputs, return type, and post-conditions; let AI implement and self-correct. Focus on validation and correctness to cut boilerplate. Try AI Functions for receipts parsing. 🔎💡 #Python #AIFunctions #DevTools #CodeQuality #AI

The Python Function That Implements Itself dev.to

1 Comment
Like Comment
To view or add a comment, sign in
Jamshaid Ali
2mo
Report this post
I recently conducted a benchmark comparing Python 3.13 and 3.14 on the same CPU-heavy task, initially out of curiosity. The results were surprising; the performance difference was significant and has changed my perspective on parallelism in Python. While optimizing a CPU-bound data pipeline, my usual approach was to use ProcessPoolExecutor. Although it effectively handles tasks, the OS-level process spawn cost can accumulate quickly. Python 3.14 introduced a new option: InterpreterPoolExecutor. This allows for multiple isolated Python interpreters within the same process, eliminating GIL conflicts. I benchmarked the performance of Python 3.13 versus 3.14 as follows: ───────────────────────────────────────── 📊 1. HEAVY CPU TASKS (8 tasks, 4 workers) 🔴 Threads: 2.519s (GIL serializes everything) 🟠 Processes: 1.222s (parallel, but costly to spawn) 🟢 Subinterpreters: 1.130s (parallel and lighter) ───────────────────────────────────────── ⚡ 2. STARTUP COST (50 tiny tasks, where it really shows) 🟠 Processes: 0.271s 🟢 Subinterpreters: 0.128s (about 2x faster to start) 📈 3. SCALING (1 → 8 workers) 🔴 Threads: flatlined at ~1.9s (no real scaling benefit) 🟢 Subinterpreters: 2.16s → 0.91s (close to linear scaling) ───────────────────────────────────────── The key takeaway is that we can achieve process-level parallelism with thread-like startup speed, without GIL contention or extra process memory overhead, all within the standard library. Are you still using ProcessPoolExecutor for CPU-bound work? I am genuinely interested in whether subinterpreters could be a practical improvement in your stack. #Python #Python314 #SoftwareEngineering #Performance #Concurrency #BackendDevelopment #DataEngineering
Like Comment
To view or add a comment, sign in
UNDERCODE NEWS

1,055 followers
1mo
Report this post
Multipart (Python library), ReDoS, CVE-2026-28356 (High) The `parse_options_header()` function in `multipart.py` versions up to `1.3.0` contains a regular expression with an ambiguous alternation . When parsing a maliciously crafted `Content-Type` header (specifically the options section), the regex engine attempts to validate the string against multiple paths . Due to the nested repetition and alternation in the pattern, providing a carefully crafted input with many backslashes and quotes causes exponential backtracking ....

Multipart (Python library), ReDoS, CVE-2026-28356 (High) dailycve.com
Like Comment
To view or add a comment, sign in
DailyCVE

85 followers
1mo
Report this post
Multipart (Python library), ReDoS, CVE-2026-28356 (High) The `parse_options_header()` function in `multipart.py` versions up to `1.3.0` contains a regular expression with an ambiguous alternation . When parsing a maliciously crafted `Content-Type` header (specifically the options section), the regex engine attempts to validate the string against multiple paths . Due to the nested repetition and alternation in the pattern, providing a carefully crafted input with many backslashes and quotes causes exponential backtracking ....

Multipart (Python library), ReDoS, CVE-2026-28356 (High) dailycve.com
Like Comment
To view or add a comment, sign in
Vamshi Bonagani
2mo
Report this post
Shallow Copy vs Deep Copy in Python 🐍 Understanding copying in Python is very important when working with lists and objects. ✅ Shallow Copy - Creates a new object - But nested objects are still shared - Changes in nested elements will affect both copies Example: import copy a = [[1, 2], [3, 4]] b = copy.copy(a) b[0][0] = 100 print(a) # Changed because nested list is shared ✅ Deep Copy - Creates a completely independent copy - Nested objects are also copied - Changes will NOT affect original object Example: import copy a = [[1, 2], [3, 4]] b = copy.deepcopy(a) b[0][0] = 100 print(a) # Original list unchanged 📌 Key Difference: - "copy.copy()" → Shallow Copy - "copy.deepcopy()" → Deep Copy Understanding this concept helps avoid unexpected bugs in real-world projects. #Python #Programming #Coding #Learning #SoftwareDevelopmentIf
2 Comments
Like Comment
To view or add a comment, sign in
Emmimal P Alexander
2mo
Report this post
Stop Using sort() Just to Check Order in Python I still see this pattern in production code: def is_sorted_bad(lst): return lst == sorted(lst) It works… but it’s inefficient. Python’s built-in sort (Timsort) runs in O(n log n) and creates a new list — just to answer a yes/no question. If your list has 1 million elements, that’s ~20 million comparisons and a full memory copy. All that… when you only need to check adjacent pairs. The Right Way (O(n) + Early Exit) def is_sorted(lst): return all(x <= y for x, y in zip(lst, lst[1:])) ✔ O(n) time ✔ Stops at first violation ✔ No unnecessary sorting ✔ Clean & Pythonic And if you're on Python 3.10+, itertools.pairwise() is even cleaner. What This Article Covers In my latest blog post, I break down: • 5 efficient O(n) methods • Non-decreasing vs strictly increasing • Descending checks • Edge cases (None, mixed types, duplicates) • When sort() is actually the right choice • NumPy & Pandas native solutions • Performance comparison table Plus real benchmarks and interview-ready explanations. If you're writing performance-sensitive Python — especially in data pipelines — this is worth knowing. Full article here: https://lnkd.in/giChyRRc Curious — how many times have you seen lst == sorted(lst) in real projects?
1 Comment
Like Comment
To view or add a comment, sign in
Nanabala mahalakshmi Lakshmi
2mo
Report this post
Here’s a LinkedIn-ready post on Expressions in Python 👇 --- 🧮 Expressions in Python In Python, an expression is a combination of values, variables, operators, and function calls that evaluates to a result. Simply put 👉 If Python can calculate it and return a value, it’s an expression. --- 🔹 Types of Expressions in Python 1️⃣ Arithmetic Expressions Used to perform mathematical operations. a = 10 b = 5 result = a + b # 15 Operators: + - * / % ** // --- 2️⃣ Relational (Comparison) Expressions Compare two values and return True or False. x = 10 y = 20 print(x < y) # True Operators: == != > < >= <= --- 3️⃣ Logical Expressions Combine multiple conditions. age = 25 print(age > 18 and age < 60) # True Operators: and or not --- 4️⃣ Bitwise Expressions Perform operations at the binary level. a = 5 # 0101 b = 3 # 0011 print(a & b) # 1 Operators: & | ^ ~ << >> --- 5️⃣ Assignment Expressions Assign values using = (and shortcut operators like +=, -=, etc.) x = 5 x += 3 # 8 --- 6️⃣ Conditional (Ternary) Expression Short-hand for if-else. num = 10 result = "Even" if num % 2 == 0 else "Odd" --- ✅ Advantages of Using Expressions ✔ Makes code concise and readable ✔ Helps perform computations efficiently ✔ Improves logical decision-making ✔ Reduces unnecessary lines of code --- 🚀 Final Thought Mastering expressions is the foundation of writing efficient Python programs. The better you understand expressions, the stronger your coding logic becomes! #Python #Programming #Coding #LearnPython #Tech #Developers #LinkedInLearning
Like Comment
To view or add a comment, sign in

219 followers

294 Posts

View Profile Connect

Python Regex Engine with Recursive Backtracking

More Relevant Posts

Explore content categories