Evaluating AI Coding Agents Through a Side Project: Building A Media Organizer
Over the past week I spent about 8 focused hours experimenting with AI coding agents.
This was not a product build. It was an evaluation experiment.
As someone with a B.Tech in Computer Science and a postgraduate degree in management, I’m increasingly interested in understanding how AI is evolving as an engineering tool.
The question I wanted to test was simple:
Can AI coding agents handle messy real-world engineering logic when the constraints are described clearly in natural language?
To explore that, I built a small CLI utility called Sherlock Media Organizer using:
The objective was to push the AI into solving practical data problems, not toy coding exercises.
The Problem
Most people today have thousands or tens of thousands of photos and videos spread across:
Traditional organizers group files rigidly by date or location. But real media libraries contain a long list of messy edge cases:
The goal was to build a system that reconstructs a coherent timeline from this chaos.
Architecture Overview
Sherlock is structured as a modular Python CLI application with the following layers:
Key libraries used include:
Example 1: Handling Large 4K Video Files Efficiently
Problem
Media libraries contain files ranging from small images to multi-gigabyte 4K videos. Processing them naively causes massive performance bottlenecks.
Architecture Decision
The pipeline treats media differently depending on file size and metadata availability.
Decision Tree
Instead of reading the entire file, the hash is generated using three sampling points:
Each chunk reads roughly 1 MB of data, meaning a 3 GB file only requires reading about 3 MB total.
This produces a highly reliable fingerprint while avoiding expensive disk reads.
Example 2: Trip Intelligence Architecture
Problem
Photo organizers usually group files by date folders. But human memory works in events and trips, not timestamps.
Example:
IMG_001
IMG_002
IMG_003
Those could represent a Paris vacation, not random days.
Architecture Approach
Sherlock implements a state machine model.
The system constantly tracks whether the user is Home or Away.
Decision Tree
Additional Triggers
Trips are also split when:
• altitude jumps exceed ~300 meters (possible flight) • time gap exceeds 72 hours while traveling • location changes drastically
This creates semantic travel segments rather than arbitrary folders.
Example 3: Cross-Device Trip Merging Logic
Problem
Families often capture trips on multiple devices.
Example timeline:
Phone A
Monday: Goa
Phone B
Tuesday: Goa
Traditional organizers produce two folders.
Recommended by LinkedIn
Architecture Solution
Sherlock merges trip segments using a graph traversal approach.
Decision Tree
The merging step uses BFS traversal over trip segments, allowing clusters to grow dynamically.
The result becomes a single family trip event.
Example 4: Duplicate Image Detection Architecture
Problem
The same photo often exists in multiple forms:
• original camera photo • WhatsApp compressed version • resized upload • edited variant
Binary hashing cannot detect these.
Decision Tree
Result
The system keeps the highest resolution version and discards lower quality copies.
Example 5: Inferring Missing GPS Data
Messaging apps frequently strip GPS metadata.
Example:
During a trip you might download:
These files lack location data.
Sherlock solves this using context inheritance.
If a GPS-less image appears between two images taken in the same city, the system assumes it belongs to the same trip unless contradictory evidence appears.
This preserves timeline continuity.
Example 6: Preventing Storage Overflow
Problem
Large reorganizations can fail when disk space runs out.
Decision Flow
An Interesting Observation
What surprised me was how much of this logic the AI coding agent was able to implement once the constraints were clearly described.
Instead of writing hundreds of lines manually, the workflow looked like this:
Describe the problem -> Define the edge cases -> Specify performance constraints -> Iterate on the architecture
The AI then generated the implementation across multiple modules.
The engineer’s role increasingly becomes problem architect rather than code typist.
The Larger Shift
Back in engineering school we were trained to focus on:
Those skills remain essential.
But the interface between humans and computers is changing.
Increasingly, English is becoming a way to express system logic.
The quality of the outcome depends less on typing syntax and more on:
Final Thoughts
Sherlock Media Organizer is just a small side project.
But it served its purpose.
It has taught me that AI coding agents are now capable of handling:
As long as the problem is framed clearly.
That might be the most important engineering skill in the AI era.
Clear thinking.
GitHub Repository: https://github.com/RICEforever/Sherlock-Media-Organizer.git