Creating photo mosaics from Instagram using Python, Linear Optimisation and Neural Networks

Creating photo mosaics from Instagram using Python, Linear Optimisation and Neural Networks

People seem to absolutely love Instagram. At least from as a data science perspective, it’s a fantastic data source for Natural Language Processing (NLP), network analysis, audience segmentation and plenty more. With maybe a bit too much free time over Christmas I wondered: could something cool be done with the photos themselves?

Who doesn’t love personalised gifts too, so let’s see if we can make a someone a mosaic from all their own photos.

No alt text provided for this image

There are plenty of companies who make photo mosaics, but they seem to have two main limitations for this purpose:

  • No access to Instagram photos: someone would have to spend hours saving and uploading their photos.
  • Low quality mosaics: most of them ‘cheat’ by heavily overlaying the target image, or duplicating photos.

No alt text provided for this image

Overall objective: Programmatically create a high-quality photo mosaic from a given Instagram account and target image.


Overview of approach

No alt text provided for this image

1) Get photos from Instagram

After getting a few unsuccessful attempts quickly out the way, the oracle finally led me to a few websites (igdownloader.comigram.io) that seem to work just fine. The user simply inputs the URL of the post, and hey presto — you can download the photos.

No alt text provided for this image

I eventually figured out roughly their approach, using some articles and videos as great starting points:

  • Open an automated browser (Selenium) and navigate to a post
  • Read the html of the page (Beautiful Soup)
  • Extract and tidy up image URLs (regex)
  • Read and save images (requests)


2) Crop and resize photos

The final mosaic is essentially our best representation of the target image, using hundreds of tiny square tiles. We therefore need to 1) crop and resize our target image to the size of our desired output, and 2) crop and resize our input photos to the same size as the tiny square tiles.

The first step here is deciding on output dimensions, which were kindly borrowed from thebirthposter.com. Using this we can crop and resize the target image.

No alt text provided for this image

Since we know the number of available input photos, we can use a bit of maths to work out the number of rows and columns of the tile grid, and therefore the necessary tile dimensions.

No alt text provided for this image


3) Score possible photo placements

To create the best representation of the target image, we need to place each input photo in the tile it looks the most similar to. We can use the mean squared error (MSE) method from scikit-image, which compares two images and returns a score from 0 (images are identical) to 1 (images are opposites). This is vastly superior to comparing just the average colours of two images, which loses valuable information about the shapes and regions of colours inside.

No alt text provided for this image

For our case, we make matrix of results by scoring each input photo against each tile.

No alt text provided for this image

4) Arrange input photos optimally

‘Where should I place each input photo so that I get the best-looking mosaic?’ is quite a tricky question to answer. My approach can be broken down into three main iterations:

  • A) Best match
  • B) Linear optimisation
  • C) Tile weighting

4A) Best match

‘Place the best-matching photo first’ is basically this idea, i.e. select the photo-tile combination with the lowest mean squared error. Since this tile is now filled, we then ignore it going forward. Since we want to make this a ‘true’ mosaic and avoid duplicating any photos, we ignore the input photo going forward too.

No alt text provided for this image


Although some tiles are filled brilliantly at the start (low MSE), we can see that by the end, there are large regions that have to be filled with whatever happens to be leftover (high MSE).

No alt text provided for this image


4B) Linear optimisation

We can rephrase our question slightly more mathematically as ‘what set of acceptable photo-tile combinations has the minimum total mean squared error?’

Although the ‘best match’ above approach makes logical sense, it does not necessarily product the optimal results. Enter linear optimisation.

Linear optimisation seeks to minimise or maximise an objective function, given a set of constraints. In a manufacturing setting, the objective might be ‘maximise revenue from selling tables and chairs’, and the constraints may be total units of wood, metal and labour available. In other words, how many tables and/or chairs should I make in order to maximise profits?

For this application, the objective function is to ‘minimise total MSE’ and the constraints are ‘use exactly one photo per tile’, and ‘use each input picture no more than once’.

We can apply these constraints by using a dummy variable for each photo-tile combination, i.e. its value is 1 (we use that photo-tile combination) or 0 (we don’t).

No alt text provided for this image

To set up and solve this problem I used a great package called PuLP. The specific algorithms that do the solving are a bit too involved to explain in detail here, but if you are interested in a bit more background, check out this talk at PyData.

No alt text provided for this image

When comparing the two approaches, linear optimisation reduces average error by 26% (0.093 to 0.069). We can also see it has a narrower distribution, with fewer excellently places tiles (very low MSE) and far fewer poorly placed tiles (high MSE).

No alt text provided for this image

4C) Tile weighting

Unfortunately, not all tiles were created equal. Having a good match for a face tile is much more important than for a background tile. The more important the tile, the more important it is to match it with a photo that gets a low MSE.

Considering our linear optimisation problem (minimise total MSE), if we scaled up the MSEs corresponding to an important tile by a factor of 1000, the algorithm will make absolutely sure the best possible photo is chosen for that tile.

It follows then: the more important the tile, the higher the scaling factor.

First, we have to identify the important objects in the target image. Identifying people and faces in images and video is a relatively common data science problem, with many pretrained neural network models readily available. For objects such as people, PointRend (Facebook AI Research, 2020) is able to predict object boundaries with great detail. For faces, Mediapipe (Google, 2019) is able to predict an incredible 468 facial landmarks, again capturing the boundaries in great detail.

No alt text provided for this image

Second, we identify the tiles corresponding to the labelled face and person regions. The final step is then to determine the weighting of each tile, which I chose to be a function of distance from the labelled regions. This prevents any harsh borders around the person/face and also seems to give a nice gradient border around the person.

No alt text provided for this image

As expected, the linear optimisation algorithm has found great matches for the important face regions, and less so for the background.

No alt text provided for this image

Comparing all methods, it is clear that adding in the tile weighting improves the output dramatically.

No alt text provided for this image

Of the five companies I was able to test, only one did it without the heavy image masking.

No alt text provided for this image

Next steps

I started this little project just out of curiosity, and it’s been great fun developing it so far. It’s unlikely to turn into a huge money spinner, but it would be exciting to solve the problems surrounding scalability. I am currently limited by:

  • Collecting the photos: Instagram quite rightly assumes it’s a bot that is tirelessly looking at photos for 5+ hours. One solution could be to create a task queue, and share the load across multiple machines on AWS.
  • Processing time: the linear optimisation algorithm does not scale linearly with number of photos, so at around 2,500 it either takes upwards of an hour or I run out of memory on my laptop. I have got around this so far by solving sections of the target image sequentially, but it’s not a fantastic solution. Again, cloud computing could be the answer.

For now at least, I’ve created an Instagram page to show some of the end results — feel free to check them out on @gramdesigns_ or connect with me on LinkedIn.

:)

No alt text provided for this image


And there was me thinking you'd stayed up all night, printing them out on your printer, cutting them up, and arranging them by hand

Love this Tom Hudson - super interesting breakdown too! 👏

To view or add a comment, sign in

Others also viewed

Explore content categories