Sorting Messy Photo Folders with Machine Learning

1mo

Sorting "Messy Photo Folder" with Applied ML 📸 We’ve all been there: a massive, disorganized folder of photos from a wedding or a trip or a backup that's impossible to sort manually. I wanted a simple, local way to group these by person without over-engineering it. So, I built and open-sourced image-clustering-ai. It’s a straightforward Python pipeline that: ✅ Filters out blurry or tiny photos. ✅ Uses InsightFace to "see" the people. ✅ Automatically clusters them into folders by ID. I optimized for the "boring" stuff that matters: making it easy to run, readable, and actually useful for a real-world photo dump. It’s not production ready. But it’s a solid, working baseline for anyone interested in practical Machine Learning. Check out the repo, try it on your own "messy folder," and let me know what you think! 👉 Repo: https://lnkd.in/dKHeRjKv #OpenSource #Python #MachineLearning #SoftwareEngineering #ComputerVision

1 Comment

Nshan Vardanyan 1mo

Really like the “boring but important” mindset here, that’s what makes tools actually usable. Local + simple + effective is underrated. Definitely going to try this on my own messy folder 😄

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Muhammad Usama Akram
1mo
Report this post
Ever felt your AI/ML scripts dragging when performing the same computation multiple times? 🤔 Whether it's complex feature engineering, lookup tables, or even some model predictions, repeated calculations can seriously slow you down. Good news! Python has a neat trick up its sleeve: @functools.lru_cache. This isn't just a fancy decorator; it's like giving your functions a super-smart memory. It stores the results of expensive function calls and, if you call that function again with the *same inputs*, it instantly returns the cached result instead of re-running the whole thing. 🧠💨 Think about it for feature generation: if you compute `sentiment_score('positive review')` dozens or hundreds of times, `lru_cache` ensures that complex calculation only happens ONCE. The rest are instant lookups! This little gem can dramatically speed up your data preprocessing and model experimentation. Ready to give your ML workflows a serious boost? What's your go-to Python trick for optimizing ML pipelines? Share below! 👇 #Python #MachineLearning #AICoding #PythonTips #DataScience
Like Comment
To view or add a comment, sign in
Aniket Chaudhary
1mo Edited
Report this post
✨Project No. 1 🚢 Just shipped my end-to-end ML project — Titanic Survival Prediction! Built a complete pipeline from raw data to a live interactive web app using Python, scikit-learn, and Streamlit. 🔧 What's under the hood: • Auto-fetches the Titanic dataset from OpenML • Engineered features: passenger title, family size, fare per person • Trained & compared 4 models — RandomForest, GradientBoosting, Logistic Regression & XGBoost • Best model selected via 5-fold cross-validation • Deployed as a Streamlit app with real-time survival probability prediction 📊 Metrics tracked: Accuracy, Precision, Recall, F1-Score, ROC-AUC This project helped me get hands-on with the full ML workflow — from EDA and feature engineering to model evaluation and deployment. Always more to learn, but proud of this one! 🙌 #MachineLearning #AI #Python #DataScience #Streamlit #MLProject #Sklearn #OutriX

2 Comments
Like Comment
To view or add a comment, sign in
Priyanka Tiwari
2w
Report this post
Just shipped a Retrieval-Augmented Generation (RAG) project and honestly? It's been one of the most interesting things I've worked on. The concept is straightforward: → Feed it documents → Use embeddings for semantic search → Retrieve relevant context from a vector database → Generate answers grounded in your actual data But pulling it all together taught me something important: most "smart" AI systems aren't magic. They're smart pipelines. What I used: • Python • Embeddings for semantic search • ChromaDB (vector database) • A full RAG pipeline This feels like a genuine step toward building real AI applications instead of just experimenting with models. If you've built something similar, I'd love to hear how you approached the trickier parts—especially around retrieval and context quality. GitHub : https://lnkd.in/gwmqvvuT #MachineLearning #AI #LLMs #RAG #Python
Like Comment
To view or add a comment, sign in
Andrew Wheeler
1mo
Report this post
When using LLMs to do structured data extraction, one common error is if you do not allow the model to return a null response, it will be forced to hallucinate. If using k-shot examples, it will often just interpolate from the prior k-shot examples. Structured data extraction from messy PDFs is the number one clear business use case I am seeing across many domains in which gen AI is an easy win. Pick up a copy of the book, either paperback or epub, from https://lnkd.in/eivj6_PG if you want an easy step-by-step walkthrough on how to write python code to process PDFs.
2 Comments
Like Comment
To view or add a comment, sign in
Sanjay Kumar Katakam
2w Edited
Report this post
Most people use keyboards every day — but very few think about how typing efficiency can actually be measured. I built a machine learning model to predict typing time between key pairs (bigrams) based on keyboard layout and key positions. By applying feature engineering and model optimization, I reduced prediction error (MAE) from 174.8 ms to 94.4 ms — a 45.9% improvement. This project helped me understand how data-driven approaches can be used to evaluate and improve real-world user interactions. Next, I’m working on turning this into a simple usable application. GitHub: https://lnkd.in/gbHiKdGu #MachineLearning #Python #AI
Like Comment
To view or add a comment, sign in
Nitesh Kumar Varma
3w
Report this post
Hook: Why does a 30-second prediction take milliseconds in production? It’s all in the Data Structures(DSA). I just finished building a kNN inference engine from scratch to explore why DSA is the backbone of scalable AI. What I built: A pure Python kNN implementation using KD-Trees and Max-Heaps for optimized neighbor searching. Used PCA to overcome the Curse of Dimensionality, turning a 30D "information mist" into a dense 3D cluster. AI is a "lazy learner" that postpones processing until the prediction step. If your data structures aren't optimized, your model won't survive at scale. Benchmarked Brute Force vs. Ball Trees vs. KD-Trees on 200,000 rows to prove the shift from O(n) to O(log n) complexity. Full code and performance graphs on GitHub: https://lnkd.in/gdsfV5xy #AI #MachineLearning #Python #Programming #Algorithms #TechPortfolio #DSA #DataStructuresAndAlgorithm #ScalableAI #AINews
4 Comments
Like Comment
To view or add a comment, sign in
Gunel Rafig
1mo
Report this post
Machine Learning from scratch: Lesson 9 Stop treating Pandas like a black box! 🕵️♂️📊 When you write df.groupby() or df.iloc[], do you know what’s actually happening in your computer's memory? In Machine Learning Series, we go beyond the syntax. We look at Pandas as a "Data Detective" and a conveyor belt that prepares your raw data for the AI engine. In this deep dive, you’ll discover: 🔹 Boolean Masking: How data is actually "filtered" (it’s not magic, it’s a True/False mask). 🔹 Split-Apply-Combine: The 3-step internal strategy of GroupBy. 🔹 The Memory Secret: Why DataFrames are actually collections of Vectors (Series). 🔹 loc vs iloc: The definitive logic to never confuse them again. If you want to move from "copy-pasting code" to "understanding the system," this article is for you. 🔗 Read the full lesson 👇 #MachineLearning #DataScience #Pandas #Python #AI #DataCleaning #Analytics #LearningJourney

Machine Learning Series — Lesson 1.9: Pandas (The Data Detective 🕵️♂️) medium.com
Like Comment
To view or add a comment, sign in
Federico Quiñones
1mo
Report this post
What's the best way to understand machine learning? Build it from scratch. I created a Python engine that runs the 3 fundamental ML paradigms on real-world-style datasets: 1. Supervised: Drop a CSV with a target column and it automatically detects if it's a classification or regression problem, trains a model, and reports accuracy, confusion matrix, and feature importance 2. Unsupervised: Same datasets, no labels. KMeans finds the optimal number of clusters, DBSCAN detects noise, and PCA reveals which features matter most 3. Reinforcement: A Q-Learning agent starts knowing nothing about a grid world and learns the optimal path through trial and error. 99% win rate after training https://lnkd.in/dkx4WXkm #python #machinelerning #AI #software #ml
Like Comment
To view or add a comment, sign in
Dhaval Pandor
2w
Report this post
🚀 The real power of Python in AI isn’t just models… it’s speed. Most people write loops. Smart people use vectorization. While working on data tasks, I realized: ❌ Traditional loops slow everything down ❌ Manual processing wastes hours But with tools like NumPy, Pandas & AI frameworks: ✅ Boolean indexing replaces loops ✅ Broadcasting handles large data instantly ✅ Vectorized logic runs across entire datasets And the result? 📊 2 hours of work → less than 20 seconds This is where Python + AI truly shines — not just building models, but accelerating everything around them. Still learning, but exploring this ecosystem has completely changed how I approach data. If you're working with data, start thinking beyond loops. 💬 Comment “Python” if you want practical examples of these tricks. #Python #AI #DataScience #NumPy #Pandas #MachineLearning #Automation #LearningJourney
Like Comment
To view or add a comment, sign in

133 followers

7 Posts

View Profile Follow

Sorting Messy Photo Folders with Machine Learning

More Relevant Posts

Explore content categories