A note on generalizability in machine learning
Does your feature space or latent space describe what you think it does? Is the discrimination between classes based on properties you would assume as an human observer or did they model catch on to a irrelevant difference? Are you ready to move from the comfy surroundings of well defined datasets to assess the generalisation capability of your model in the murky waters of new, previously unseen data?
These are questions I've returned to many times working with machine learning and image processing.
I had an unsettling feeling when evaluating the discrimination power of a number of texture descriptors. What if they showed substantial differences between classes in other aspects that what we perceived as texture differences. I normalised the intensity across the datasets to have the same mean and standard deviation to reduce the risk of some obvious sources of bias.
In the same line of thoughts I decided to create a small dataset back in 2013 while writing on my PhD thesis (link). My hypothesis was that the texture descriptors I investigated could pick up differences not only in texture but also in imaging conditions including noise levels. Despite the standard intensity normalisation.
I used by DLSR to take photos of two samples of canvas, representing two similar, but still different textures. I took images, varying the ISO setting of the camera. Changing the ISO setting on a digital camera changes the analog amplification of the signal from the sensor elements. This is why higher ISO levels allows for shorter exposure times but results in higher levels of noise in the images. In short, I used higher ISO levels to acquire images with gradually increasing levels of noise. In the figure below you see an example of a texture patch acquired with increasing ISO setting.
I applied the texture descriptors I was investigating followed by Linear Discriminant Analysis (LDA) (link to Wiki) on the respective featurespace, considering each texture-ISO combination as a separate class. LDA finds a linear subspace while minimising intra-class variance and maximising inter-class variance. This means that we will get a feeling for the possibility of discriminating between the classes given the particular feature space.
The figure below is Fig. 4.9 from my thesis and shows the two first dimensions of the LDA space of each respective featurespace. Here I also subsampled the texture patches until there were very little difference between the images, from 192x192 pixels large texture patches down to 12x12 pixels.
Without spending time on the individual texture descriptors it's worth mentioning that they represent a few different types of texture descriptors including filterbanks, co-occurrence matrix based and local binary patterns based.
It becomes evident that more or less all of these descriptors can pick up differences between not only the two texture classes but also between each one of the eight ISO levels. Without including this notion in training a machine learning model to do texture classification on this type of data would result in poor generalizability in terms of coping with different noise levels. Varying imaging conditions resulting in varying noise levels is a very common case for real world applications.
Learnings
Some of my learnings from this experiment are:
- Try to understand your latent space or feature space in your machine learning application.
- Plot and visualize your data often. Much can be learnt before trying to validate complex models.
- Consider model generalizability for the heterogeneity of new, real world data. For example, what imaging conditions and resulting effects are the nodel likely to encounter?
- Challenge your model, see if it picks up meaningful differences and ignores irrelevant aspects.
Conclusions
I would be happy if revisiting these ideas and writing these notes helped you in any way, especially if it means that you start plotting and visualizing your data even more from now on!
Oh, one more thing, I made the texture dataset I used here available for everyone via my website (link). The rest of the texture datasets from my thesis are also there (link).
Best regards,
Gustaf Kylberg, PhD
"Plot and visualize your data often" sounds like sweet music in my ears :) Thanks for posting your findings, interesting indeed!
Very interesting, Gustaf. Thanks for sharing this!