Color and Classification
Classification of image is one of the basic task in Machine Learning (ML), specifically in Computer Vision (CV). Many deep learning / shallow learning architectures exist today to solve this problem with high accuracy. Many questions still does not have clear answer in this simple task. Some examples include
Having clear answers to such fundamental questions will allow us to take the field to the next level.
Today AI is more artificial and less intelligent. The goal is to make it less artificial and more intelligent.
The most common representation of image is a matrix where each of RGB can take values from 0 to 255 (256 values). For example on an image of size 300x300x3, each pixel can take a value of 0 to 255. The permutation and combination a neural network architecture needs to learn is too high given 300x300x3 image. It is of the order 256**(3*300*300) which is an insane value. Does a human eye understand all this permutation and combination to classify an image?
What are the ways we can reduce the permutation and combination? One simple scheme is to bucket pixel values in a range of 16 or 5 as below. All other values will be mapped to the nearest value in the below bucket list. You loose lot of information in such a scheme, but do you really loose accuracy very much? Does bucketizing help us to generalize better and protect us from adversarial attacks with some loss in accuracy?
Today we equate accuracy to generalization which is questionable. Generalization is the ability to abstract information in such a way that the future outcomes are rarely impacted even if the underlying data representation changes.
b_255 - Original Image - [0, 1, 2, 3, 4, .............. 251, 252, 253, 254, 255]
b_16 - Bucket range of 16 - [0, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 255]
b_5 - Bucket range of 5 - [0, 50, 100, 150, 200, 250]
Image formats do matter while saving them in bucket ranges. The difference between lossy and lossless compression needs to be understood. For example a format like jpg might not be able to store/restore the exact pixel values. Format like png will be better in case we want to study the impact of bucketizing exactly. If we are training the same million images in 2 different formats which are bucketized (jpg and png), do we get the same accuracy provided we keep all other parameters constant? The answer is no since the permutation and combination of pixel values you might have to deal in a jpg format might be higher than in a png format even though it is the same million images we have trained on using the same neural network topology with the same number of epochs. The outcomes are different with different image formats even though the information is exactly same from the perspective of a human eye.
Recommended by LinkedIn
What is the impact of bucketizing image pixels on image classification? The stats of data used for training and testing are given below. The images are randomly collected from internet and are not part of any standard dataset. The b_16 and b_5 images are of format jpg which will not be exactly a bucket range of 16 and 5 respectively. The results obtained here should be replicable even on different images and different image formats like png which is exact.
Number of classes (Training): 981, Total images (Training): 424594
Number of classes (Testing): 981, Total images (Testing): 107867
Neural Network Topology: resnet18
A series of experiment was conducted to answer some fundamental questions.
The above results shows that bucketizing does have an impact on accuracy (not generalization), but it is not in proportion to the amount of information loss due to bucketization. For example restricting each pixel value from [0 .... 255] to just 6 pixel value [0, 50, 100, 150, 200, 250] which is approximately 42.67:1 from information loss where as the accuracy reduction is hardly 2 to 3% maximum. Loss of large amounts of information due to bucketing does not lead to an equal reduction in accuracy.
Training a model using original image (b_255) and inferring them on bucketed images (b_16 or b_5) has major impact on accuracy. The reverse of training on bucketed images (b_16 or b_5) and inferring on original image (b_255) has lesser accuracy loss. The best accuracy is obtained always when training and inferring on same bucket size images. Training on b_5 and inferring on b_16 surprisingly reduces accuracy by large margins. Is this really because intersection of pixels [0, 50, 100, 150, 200, 250] and [0, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 255] is very limited. This brings in an interesting question. How sensitive are these neural networks to pixel values? Why did the model trained on b_5 perform worst in images of b_16? Is it really because intersection of pixel values are limited?
The Top 1 accuracy loss from b_255 to b_5 is slightly higher compared to that of Top 5 accuracy. This means that for most practical purposes, the accuracy loss is minimal compared to the information loss from the image.
Robustness against adversarial attack is important when such technologies are deployed in large scale. Bucketing reduces the permutation and combination of pixel values that a neural network has to understand. No experiment was conducted to understand if b_5 is more robust than b_255 or b_16 when it comes to adversarial attacks.
The concept of bucketizing image pixels are similar to post quantization of weights in some sense. No experiment was conducted to understand the impact of speed and accuracy of both concepts and how they compare in terms of accuracy.
We are in the early stages of understanding AI and its limitations. Like any other technology it has potential to influence future systems and the way humans interact and function. Today we tend to see two sets of people when talking about AI. One who is highly optimistic without knowing/acknowledging the limitations of current AI technology. The second set of people thinks that the hype around AI is high and an AI winter is coming. I would like to take an intermediate view of technology as opposed to either extremes. Fundamental understanding of any technology will allows us to improve it over a period of time. The article was one such to understand what is happening under the hood from image classification perspective. It is a hope that we will have lesser parameter models in the future which are more robust in production.
This is a great
Great article, Rajeev M A . Good to see a quantifiable review on this topic. The phrase AI is more artificial and less intelligent catches the eye. In today's world , almost every other product claims to be driven by AI , without much substantiation of accuracy.
NIce one. Have not come across this term "bucketize" before especially in the image pixel context. It looks to be similar to the quantization of image pixel values. Any specific reason to prefer "bucketize" over "quantize"?