Sorting "Messy Photo Folder" with Applied ML 📸 We’ve all been there: a massive, disorganized folder of photos from a wedding or a trip or a backup that's impossible to sort manually. I wanted a simple, local way to group these by person without over-engineering it. So, I built and open-sourced image-clustering-ai. It’s a straightforward Python pipeline that: ✅ Filters out blurry or tiny photos. ✅ Uses InsightFace to "see" the people. ✅ Automatically clusters them into folders by ID. I optimized for the "boring" stuff that matters: making it easy to run, readable, and actually useful for a real-world photo dump. It’s not production ready. But it’s a solid, working baseline for anyone interested in practical Machine Learning. Check out the repo, try it on your own "messy folder," and let me know what you think! 👉 Repo: https://lnkd.in/dKHeRjKv #OpenSource #Python #MachineLearning #SoftwareEngineering #ComputerVision
Sorting Messy Photo Folders with Machine Learning
More Relevant Posts
-
Ever felt your AI/ML scripts dragging when performing the same computation multiple times? 🤔 Whether it's complex feature engineering, lookup tables, or even some model predictions, repeated calculations can seriously slow you down. Good news! Python has a neat trick up its sleeve: @functools.lru_cache. This isn't just a fancy decorator; it's like giving your functions a super-smart memory. It stores the results of expensive function calls and, if you call that function again with the *same inputs*, it instantly returns the cached result instead of re-running the whole thing. 🧠💨 Think about it for feature generation: if you compute `sentiment_score('positive review')` dozens or hundreds of times, `lru_cache` ensures that complex calculation only happens ONCE. The rest are instant lookups! This little gem can dramatically speed up your data preprocessing and model experimentation. Ready to give your ML workflows a serious boost? What's your go-to Python trick for optimizing ML pipelines? Share below! 👇 #Python #MachineLearning #AICoding #PythonTips #DataScience
To view or add a comment, sign in
-
-
✨Project No. 1 🚢 Just shipped my end-to-end ML project — Titanic Survival Prediction! Built a complete pipeline from raw data to a live interactive web app using Python, scikit-learn, and Streamlit. 🔧 What's under the hood: • Auto-fetches the Titanic dataset from OpenML • Engineered features: passenger title, family size, fare per person • Trained & compared 4 models — RandomForest, GradientBoosting, Logistic Regression & XGBoost • Best model selected via 5-fold cross-validation • Deployed as a Streamlit app with real-time survival probability prediction 📊 Metrics tracked: Accuracy, Precision, Recall, F1-Score, ROC-AUC This project helped me get hands-on with the full ML workflow — from EDA and feature engineering to model evaluation and deployment. Always more to learn, but proud of this one! 🙌 #MachineLearning #AI #Python #DataScience #Streamlit #MLProject #Sklearn #OutriX
To view or add a comment, sign in
-
Just shipped a Retrieval-Augmented Generation (RAG) project and honestly? It's been one of the most interesting things I've worked on. The concept is straightforward: → Feed it documents → Use embeddings for semantic search → Retrieve relevant context from a vector database → Generate answers grounded in your actual data But pulling it all together taught me something important: most "smart" AI systems aren't magic. They're smart pipelines. What I used: • Python • Embeddings for semantic search • ChromaDB (vector database) • A full RAG pipeline This feels like a genuine step toward building real AI applications instead of just experimenting with models. If you've built something similar, I'd love to hear how you approached the trickier parts—especially around retrieval and context quality. GitHub : https://lnkd.in/gwmqvvuT #MachineLearning #AI #LLMs #RAG #Python
To view or add a comment, sign in
-
When using LLMs to do structured data extraction, one common error is if you do not allow the model to return a null response, it will be forced to hallucinate. If using k-shot examples, it will often just interpolate from the prior k-shot examples. Structured data extraction from messy PDFs is the number one clear business use case I am seeing across many domains in which gen AI is an easy win. Pick up a copy of the book, either paperback or epub, from https://lnkd.in/eivj6_PG if you want an easy step-by-step walkthrough on how to write python code to process PDFs.
To view or add a comment, sign in
-
-
Most people use keyboards every day — but very few think about how typing efficiency can actually be measured. I built a machine learning model to predict typing time between key pairs (bigrams) based on keyboard layout and key positions. By applying feature engineering and model optimization, I reduced prediction error (MAE) from 174.8 ms to 94.4 ms — a 45.9% improvement. This project helped me understand how data-driven approaches can be used to evaluate and improve real-world user interactions. Next, I’m working on turning this into a simple usable application. GitHub: https://lnkd.in/gbHiKdGu #MachineLearning #Python #AI
To view or add a comment, sign in
-
Hook: Why does a 30-second prediction take milliseconds in production? It’s all in the Data Structures(DSA). I just finished building a kNN inference engine from scratch to explore why DSA is the backbone of scalable AI. What I built: A pure Python kNN implementation using KD-Trees and Max-Heaps for optimized neighbor searching. Used PCA to overcome the Curse of Dimensionality, turning a 30D "information mist" into a dense 3D cluster. AI is a "lazy learner" that postpones processing until the prediction step. If your data structures aren't optimized, your model won't survive at scale. Benchmarked Brute Force vs. Ball Trees vs. KD-Trees on 200,000 rows to prove the shift from O(n) to O(log n) complexity. Full code and performance graphs on GitHub: https://lnkd.in/gdsfV5xy #AI #MachineLearning #Python #Programming #Algorithms #TechPortfolio #DSA #DataStructuresAndAlgorithm #ScalableAI #AINews
To view or add a comment, sign in
-
-
Machine Learning from scratch: Lesson 9 Stop treating Pandas like a black box! 🕵️♂️📊 When you write df.groupby() or df.iloc[], do you know what’s actually happening in your computer's memory? In Machine Learning Series, we go beyond the syntax. We look at Pandas as a "Data Detective" and a conveyor belt that prepares your raw data for the AI engine. In this deep dive, you’ll discover: 🔹 Boolean Masking: How data is actually "filtered" (it’s not magic, it’s a True/False mask). 🔹 Split-Apply-Combine: The 3-step internal strategy of GroupBy. 🔹 The Memory Secret: Why DataFrames are actually collections of Vectors (Series). 🔹 loc vs iloc: The definitive logic to never confuse them again. If you want to move from "copy-pasting code" to "understanding the system," this article is for you. 🔗 Read the full lesson 👇 #MachineLearning #DataScience #Pandas #Python #AI #DataCleaning #Analytics #LearningJourney
To view or add a comment, sign in
-
What's the best way to understand machine learning? Build it from scratch. I created a Python engine that runs the 3 fundamental ML paradigms on real-world-style datasets: 1. Supervised: Drop a CSV with a target column and it automatically detects if it's a classification or regression problem, trains a model, and reports accuracy, confusion matrix, and feature importance 2. Unsupervised: Same datasets, no labels. KMeans finds the optimal number of clusters, DBSCAN detects noise, and PCA reveals which features matter most 3. Reinforcement: A Q-Learning agent starts knowing nothing about a grid world and learns the optimal path through trial and error. 99% win rate after training https://lnkd.in/dkx4WXkm #python #machinelerning #AI #software #ml
To view or add a comment, sign in
-
-
🚀 The real power of Python in AI isn’t just models… it’s speed. Most people write loops. Smart people use vectorization. While working on data tasks, I realized: ❌ Traditional loops slow everything down ❌ Manual processing wastes hours But with tools like NumPy, Pandas & AI frameworks: ✅ Boolean indexing replaces loops ✅ Broadcasting handles large data instantly ✅ Vectorized logic runs across entire datasets And the result? 📊 2 hours of work → less than 20 seconds This is where Python + AI truly shines — not just building models, but accelerating everything around them. Still learning, but exploring this ecosystem has completely changed how I approach data. If you're working with data, start thinking beyond loops. 💬 Comment “Python” if you want practical examples of these tricks. #Python #AI #DataScience #NumPy #Pandas #MachineLearning #Automation #LearningJourney
To view or add a comment, sign in
-
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
Really like the “boring but important” mindset here, that’s what makes tools actually usable. Local + simple + effective is underrated. Definitely going to try this on my own messy folder 😄