Name: Optimize Data with Entropy Engine: Detect Duplicates & Waste | Noah Kerr posted on the topic | LinkedIn
Uploaded: 2026-03-23T21:51:37.878Z
Duration: 23 s
Channel: Noah Kerr

Noah Kerr

1mo

Most data is inefficient you just don’t see it. I built Entropy Engine to make it visible. Drop in any JSON or CSV file and it: -detects what’s wasting space (duplicates, repeated strings, whitespace) -explains why it matters (storage, compression, performance) -lets you simulate fixes before touching your data The rotating ASCII cube in the center represents your dataset watch it shrink in real time as each optimization runs. Built with Python, FastAPI, React, and real-time SSE streaming. GitHub: https://lnkd.in/gmsiXj9u #python #react #fastapi #opensource #softwaredeveloper

To view or add a comment, sign in

More Relevant Posts

Timothy Zatloff
4w
Report this post
called the same API endpoint 5 times in a row. without cache: 2.51s with lru_cache: 0.50s 5x faster. two lines of code. @functools.lru_cache(maxsize=128) def fetch_user(user_id): ... the cache info tells the real story: hits=4, misses=1 first call hits the actual API. next 4? served instantly from memory. this is how production systems handle repeated expensive calls — user profiles, config lookups, ML model loads, anything that doesn’t change every second. lru_cache ships with Python. no libraries. just import functools. two lines between slow and fast. #Python #Backend #DataEngineering #Performance
4 Comments
Like Comment
To view or add a comment, sign in
James Brosnahan
3w
Report this post
.0158 vs .0005 for the cached version. So searching bing: "does python lru cache return previous objects" "Yes — Python’s built‑in functools.lru_cache returns the exact same object instance that was previously computed and cached, not a copy" The overhead is in the object being recreated each call with Python objects being known to have slow creation time. There are better options for performance like writing the API in C++ with pistache or crow. Testing the time with 4 million unique users requesting their user info 3 times would be more informative. Reading that the returned data is a user data object with the changing value being a score and a constant for the username, the code needs refactoring as it muddies two use cases together. The username only needs sent the first time then only if it is or has been updated. The score is better sent via a socket or websocket if it changes in realtime and requires input from the server to be calculated or not sent at all if it can be calculated client side. If it needs to be broadcast to other client network peers with their response sent back to other peers a message queue is needed but if the peers response does not matter, the main server can handle the broadcasting. Database queries that can not just be returned by directly querying the database are not conducive to caching or not useful if they change infrequently or are only needed once or a few times at most. Having less than 4 million users, giving each user their own database on a single server can be easier than writing APIs if the data is just database table views (and the service is paid, reducing risk of hacking from users plus database caching can be used across multiple client applications)
Timothy Zatloff

Python Backend Engineer — APIs, PostgreSQL & distributed systems | Reliability-first builder | NYC
4w

called the same API endpoint 5 times in a row. without cache: 2.51s with lru_cache: 0.50s 5x faster. two lines of code. @functools.lru_cache(maxsize=128) def fetch_user(user_id): ... the cache info tells the real story: hits=4, misses=1 first call hits the actual API. next 4? served instantly from memory. this is how production systems handle repeated expensive calls — user profiles, config lookups, ML model loads, anything that doesn’t change every second. lru_cache ships with Python. no libraries. just import functools. two lines between slow and fast. #Python #Backend #DataEngineering #Performance
Like Comment
To view or add a comment, sign in
Anchal Singh
1mo
Report this post
Practiced Python functions today — *args, **kwargs, return types, and function types. The one thing that clicked — *args is for unknown number of values (stores as tuple), **kwargs is when you don't know the keys either (stores as dict). Simple but I was mixing them up before. Also learned there are actual types of functions — action, transformation, validation. https://lnkd.in/ducSzXzK #Python #LearningInPublic #DataAnalysis

DATA-ANALYSIS-WITH-PYTHON/Arguments_Return_Print_Function.ipynb at main · anchalsingh1708/DATA-ANALYSIS-WITH-PYTHON github.com
Like Comment
To view or add a comment, sign in
ArjanCodes

5,648 followers
3w
Report this post
Most implementations of the State pattern in Python look very “clean”. Lots of small classes. A base interface. One class per state. But if you’ve ever worked with one in a real project, you know the downside: transitions are scattered, behaviour is hard to see in one place, and adding new states often means touching multiple files. In today’s video, I rebuild the State pattern in a very different way. Instead of relying on inheritance, I make the state machine explicit as data and use decorators to define transitions. The result is a small, reusable engine where the entire flow becomes visible at a glance. If you’re interested in writing Python that’s easier to reason about and extend, this is a pattern worth understanding. 👉 Watch here: https://lnkd.in/e9Y3xGNF. #python #softwaredesign #designpatterns #statemachine #cleancode
Like Comment
To view or add a comment, sign in
Ajaj Mahmud Aquil
1mo Edited
Report this post
🚀 Built a PDF Text Extractor using Python & Streamlit! I often needed a quick way to extract text from PDFs without heavy software. So, I built one myself. 📄 Upload any PDF, and it instantly extracts all the text from every page — clean and simple. ⚙️ The main challenge was handling multi-page PDFs accurately across different formats using PyPDF2. 🛠️ Tech Stack: •Python 3.11.9 • Streamlit • PyPDF2 🔗 GitHub: https://lnkd.in/gvFFf2yA Would love your feedback and suggestions! 🙌 #Python #Streamlit #OpenSource #PythonDeveloper

1 Comment
Like Comment
To view or add a comment, sign in
Ayush Sahay
1mo Edited
Report this post
Managing Python environments shouldn’t be a nightmare. But for many — it still is. Here’s how to keep things clean, fast, and production-ready in 2026: 🛠️ Use uv or poetry to manage environments & dependencies. Faster, safer, and simpler than old-school pip+venv. ⚡ Speed matters – use polars, numpy, and numba to vectorize heavy loops. Even small tweaks can give 10× performance wins. 🧼 Lint + Format + Type-check = non-negotiable Ruff for linting Black for formatting Pyright or Pyrefly for fast type-checks 💡 Bonus tip: Use Typer to build CLIs in minutes. So clean, it feels like magic. 💬 What’s one Python setup rule you wish you knew earlier? #Python #CodeQuality #Productivity #DevTools #DataEngineering
Like Comment
To view or add a comment, sign in
Arjan Egges
3w
Report this post
Most implementations of the State pattern in Python look very “clean”. Lots of small classes. A base interface. One class per state. But if you’ve ever worked with one in a real project, you know the downside: transitions are scattered, behaviour is hard to see in one place, and adding new states often means touching multiple files. In today’s video, I rebuild the State pattern in a very different way. Instead of relying on inheritance, I make the state machine explicit as data and use decorators to define transitions. The result is a small, reusable engine where the entire flow becomes visible at a glance. If you’re interested in writing Python that’s easier to reason about and extend, this is a pattern worth understanding. 👉 Watch here: https://lnkd.in/eg22yEHR. #python #softwaredesign #designpatterns #statemachine #cleancode
1 Comment
Like Comment
To view or add a comment, sign in
Reuven Lerner
1mo
Report this post
You can't use a list as a #Python dict key: mylist = ['a', 'b'] d = {mylist: 10} # TypeError! Why not? Dicts run "hash" on a key to choose a pair's storage location. To avoid pairs getting lost, mutable builtins (list, set, dict) cannot be used as dict keys.
Like Comment
To view or add a comment, sign in
Suyog Yadav
2w
Report this post
Day 3/365: Comparing Two Strings Character by Character 🧵🧠 Today I worked on a simple but fundamental logic problem: checking if two strings are the same, without directly using a built-in equality check. First, I compare the lengths of both strings. If lengths differ, they can’t be the same. If lengths match, I loop through each index and compare characters one by one. If any character is different, I break and print that the strings are not the same. If the loop finishes without finding a mismatch, the else block of the for loop runs and prints that the strings are the same. The interesting part is the for-else in Python. The else only runs when the loop completes normally (no break). This makes it a clean way to express: “if I didn’t find any mismatch in the entire loop, then the strings are equal.” Day 3 done ✅ 362 more to go. #100DaysOfCode #365DaysOfCode #Python #LogicBuilding #StringComparison #ForElse #CodingJourney #LearnInPublic #AspiringDeveloper
2 Comments
Like Comment
To view or add a comment, sign in
Khushraj Rai
1mo
Report this post
🚀 Built a Movie Recommendation System using Collaborative Filtering! I recently developed a movie recommendation system that suggests movies based on user preferences and similarity patterns. The model analyzes user–movie interactions to uncover hidden patterns and deliver personalized recommendations. 👉 Simply select a movie, and the system instantly suggests similar movies you’re likely to enjoy! ✨ Key Features: • Collaborative filtering–based recommendations • User–movie similarity analysis • Instant top movie suggestions • Clean and interactive web interface 🎥 Check out the demo to see it in action! 💻 Tech Stack: Python | Scikit-learn | Pandas | NumPy | Streamlit 🔗 GitHub Repository: https://lnkd.in/g-d3DwSa 🔗 Live Demo: https://lnkd.in/dnigN_7e #MachineLearning #RecommenderSystem #Python #AIProject #DataScience #Streamlit #CollaborativeFiltering #AIML
Like Comment
To view or add a comment, sign in

48 followers

14 Posts

View Profile Follow

More Relevant Posts

Explore content categories