Coding fun - photo sorting using Python

Coding fun - photo sorting using Python

From year 2000 I started to create digitalized photos. After 21 years, there are huge amount of photos and videos and also plenties of backups in different media - CD-RW, DVD+RW, harddisk, RAID, NAS and Cloud storages from different time. Earlier this month I started to use my Google Pixel 3 to backuy those beautiful memories of my family and myself to Google Photo. It is really amazing that managed to get all those photos uploaded and sorted with date and time - with some defects but overcome eventually. Took me almost 2 weeks to finish the work - copying from different medias to phone and then backup.

Cloud backup is good, but from experience I still need other local backup and archive it in another cheaper cloud storage such as AWS S3 Glacier. Googe Photo provide a way to download all pictures. I could just use this but I still want to do it myself since I really cannot affort losing any pictures.

Prior to this recent Google Photo backup, there were many times I tried to sort those pictures. Therefore there are many different backup locations and many duplications such as same file with different file names. I even wrote a small C program to parse/get photo EXIF. It was robust and stable, but still the effort was in vain because I did not manage to find a way to extract video information.

While doing Google Photo backup, I started the idea of writing a Python program to do this. And I managed to create it today ! It took me 2 days on this Taiwan National Day holiday to accomplish it.

The way it work is easy. Extract all photo/video information, create hash based on these information, sort by hash, remove duplications to create a database that all the files are unique. Then perform copy the unique files by creation date.

Original estimation is to finish the implementation yesterday (10/10/2021), and spend one day today (10/11/2021) to clean up the code to create a github project. But while implementating it, there are still some stupid mistakes I made that delayed the work :

  • This is my first time on modern IDE. Around 30 years ago I had played with Borland C++ and Visual C++ for a short period of time. The most common CI tool I used was Makefile and VIM, or sometimes Notepad++. My son was using Anaconda/Spyder, so I think perhaps I can work with another tool such as Visual Studio Code. Then it took me some time to figure out I cannot go directly with latest supported Python 3.10 version because of many module dependency are not yet completed. Numpy was even not ready with pre-compiled code and I had to download Visual Studio 2019 to compile but still fail. The way to resolve this was to downgrade to Python 3.9 and all the module installation worked like a charm.
  • The quality of media parsing library is not as good as I think. Parsing capability is one thing that there is no one library which can match all my requirement to handle both image and videos. It is required to use multiple library to achieve the work of extracting media metadata. Quality of the library code is another thing that parsing errors and even parsing crashes happened. Exception handling is required but it took me some more time to learn and use it.
  • The different variaty of the photos and images is also a challenge. In this 20 years, I had so many different phones and digital cameras on different brand - Nikon, Canon, HTC, Dopod, Sony Ericsson, Sony, Samsung and some more, with 40 more devices type. Some devices are creating buggy metadata such as creation dates are from 1946 or 1904. This case I cannot generalize the code but need to do some special manipulation to fix the date to system modify date which might be the most accurate information I can have.
  • Learning/Googling a new language syntax is happening all the time. Getting old means cannot remembered what I studied from the book, and my musle memory on coding is C programming language. What I googled most is like "Python switch case equivalent". Not familiar with Python Casting also kills some time. Like it took me some time to understand what does it mean by str(this_can_a_function) this kind of very convenient usage of Python language.

But other than these, it is really a good and enjoyable experience doing this Python coding practice. It will take me much more time if I have to program it in C. There are many features I can finish in few lines in Python, but requires a lot of implementations in C. Perhaps C is faster and robust, but it is still much faster to code in Python especially like this kind of project.

So now the work is done. Feeling confidence again on my coding skill. And I'm happy to see my photos and videos are safe and secure. But still after all these efforts, I still keep these un-sorted files in somewhere. Perhaps I will create another tool to identify if there are still missing files that this tool does not copy...




To view or add a comment, sign in

Others also viewed

Explore content categories