Evaluating AI Coding Agents Through a Side Project: Building A Media Organizer

Evaluating AI Coding Agents Through a Side Project: Building A Media Organizer

Over the past week I spent about 8 focused hours experimenting with AI coding agents.

This was not a product build. It was an evaluation experiment.

As someone with a B.Tech in Computer Science and a postgraduate degree in management, I’m increasingly interested in understanding how AI is evolving as an engineering tool.

The question I wanted to test was simple:

Can AI coding agents handle messy real-world engineering logic when the constraints are described clearly in natural language?

To explore that, I built a small CLI utility called Sherlock Media Organizer using:

  • Gemini CLI
  • Antigravity coding agent
  • VS Code

The objective was to push the AI into solving practical data problems, not toy coding exercises.


The Problem

Most people today have thousands or tens of thousands of photos and videos spread across:

  • multiple phones
  • external drives
  • downloaded images

Traditional organizers group files rigidly by date or location. But real media libraries contain a long list of messy edge cases:

  • duplicate photos created by WhatsApp compression
  • missing GPS metadata
  • multiple devices capturing the same trip
  • 4K videos that are too large to process efficiently
  • sudden storage overflow during file transfers
  • scattered photos that belong to the same event

The goal was to build a system that reconstructs a coherent timeline from this chaos.


Architecture Overview

Article content

Sherlock is structured as a modular Python CLI application with the following layers:

  • scanner module for metadata extraction
  • intelligence module for trip segmentation
  • organizer module for deduplication and file movement
  • SQLite database layer for metadata persistence
  • dashboard generator for visualization

Key libraries used include:

  • Pillow and pillow-heif for image metadata extraction
  • imagehash for perceptual duplicate detection
  • geopy for geodesic distance calculations
  • reverse_geocoder for offline location resolution
  • pandas for reporting
  • folium for map-based dashboards


Example 1: Handling Large 4K Video Files Efficiently

Problem

Media libraries contain files ranging from small images to multi-gigabyte 4K videos. Processing them naively causes massive performance bottlenecks.

Architecture Decision

The pipeline treats media differently depending on file size and metadata availability.

Decision Tree

Article content


Instead of reading the entire file, the hash is generated using three sampling points:

  • the first ~10 percent of the file
  • the midpoint around 50 percent
  • the final segment around 90 percent

Each chunk reads roughly 1 MB of data, meaning a 3 GB file only requires reading about 3 MB total.

This produces a highly reliable fingerprint while avoiding expensive disk reads.

Article content



Example 2: Trip Intelligence Architecture

Problem

Photo organizers usually group files by date folders. But human memory works in events and trips, not timestamps.

Example:

IMG_001
IMG_002
IMG_003        

Those could represent a Paris vacation, not random days.

Architecture Approach

Sherlock implements a state machine model.

The system constantly tracks whether the user is Home or Away.

Decision Tree

Article content


Additional Triggers

Trips are also split when:

• altitude jumps exceed ~300 meters (possible flight) • time gap exceeds 72 hours while traveling • location changes drastically

This creates semantic travel segments rather than arbitrary folders.

Article content



Example 3: Cross-Device Trip Merging Logic

Problem

Families often capture trips on multiple devices.

Example timeline:

Phone A
Monday: Goa

Phone B
Tuesday: Goa        

Traditional organizers produce two folders.

Architecture Solution

Sherlock merges trip segments using a graph traversal approach.

Article content


Decision Tree

Article content


The merging step uses BFS traversal over trip segments, allowing clusters to grow dynamically.

The result becomes a single family trip event.

Example 4: Duplicate Image Detection Architecture

Problem

The same photo often exists in multiple forms:

• original camera photo • WhatsApp compressed version • resized upload • edited variant

Binary hashing cannot detect these.

Decision Tree

Article content


Result

The system keeps the highest resolution version and discards lower quality copies.


Example 5: Inferring Missing GPS Data

Messaging apps frequently strip GPS metadata.

Example:

During a trip you might download:

  • flight tickets
  • restaurant screenshots
  • shared images

These files lack location data.

Sherlock solves this using context inheritance.

If a GPS-less image appears between two images taken in the same city, the system assumes it belongs to the same trip unless contradictory evidence appears.

This preserves timeline continuity.

Article content



Example 6: Preventing Storage Overflow

Problem

Large reorganizations can fail when disk space runs out.

Decision Flow

Article content



An Interesting Observation

Article content


What surprised me was how much of this logic the AI coding agent was able to implement once the constraints were clearly described.

Instead of writing hundreds of lines manually, the workflow looked like this:

Describe the problem -> Define the edge cases -> Specify performance constraints -> Iterate on the architecture

The AI then generated the implementation across multiple modules.

The engineer’s role increasingly becomes problem architect rather than code typist.


The Larger Shift

Back in engineering school we were trained to focus on:

  • algorithms
  • edge cases
  • computational thinking

Those skills remain essential.

But the interface between humans and computers is changing.

Increasingly, English is becoming a way to express system logic.

The quality of the outcome depends less on typing syntax and more on:

  • clarity of constraints
  • system thinking
  • ability to anticipate edge cases


Final Thoughts

Sherlock Media Organizer is just a small side project.

But it served its purpose.

It has taught me that AI coding agents are now capable of handling:

  • multi-module architectures
  • real-world data problems
  • algorithmic reasoning
  • performance optimization constraints

As long as the problem is framed clearly.

That might be the most important engineering skill in the AI era.

Clear thinking.


GitHub Repository: https://github.com/RICEforever/Sherlock-Media-Organizer.git

To view or add a comment, sign in

Others also viewed

Explore content categories