Color and Classification

Color and Classification

Classification of image is one of the basic task in Machine Learning (ML), specifically in Computer Vision (CV). Many deep learning / shallow learning architectures exist today to solve this problem with high accuracy. Many questions still does not have clear answer in this simple task. Some examples include

  1. What do these architectures learn from the image? Why are we not able to explain it?
  2. Why are they susceptible to adversarial attacks?
  3. How invariant are they to translation, rotation, scaling, color, & illumination?
  4. What order do they learn and do they give importance to order while learning like shape, relationship, aggregation, color, & illumination?
  5. What happens if same number of images are trained on buckets with two different formats like jpg and png? Why are the accuracies different?
  6. Does current neural network architectures give more importance to pixel values?
  7. Does the training speed improve if the pixel values are bucketed?
  8. Does the memory requirements reduce if the pixel values are bucketed?
  9. Under what situations is color really important? For example to answer questions like, is this apple red or green?
  10. Is black and white images more than sufficient for many image classification tasks that we perform today?

Having clear answers to such fundamental questions will allow us to take the field to the next level.

Today AI is more artificial and less intelligent. The goal is to make it less artificial and more intelligent.

The most common representation of image is a matrix where each of RGB can take values from 0 to 255 (256 values). For example on an image of size 300x300x3, each pixel can take a value of 0 to 255. The permutation and combination a neural network architecture needs to learn is too high given 300x300x3 image. It is of the order 256**(3*300*300) which is an insane value. Does a human eye understand all this permutation and combination to classify an image?

What are the ways we can reduce the permutation and combination? One simple scheme is to bucket pixel values in a range of 16 or 5 as below. All other values will be mapped to the nearest value in the below bucket list. You loose lot of information in such a scheme, but do you really loose accuracy very much? Does bucketizing help us to generalize better and protect us from adversarial attacks with some loss in accuracy?

Today we equate accuracy to generalization which is questionable. Generalization is the ability to abstract information in such a way that the future outcomes are rarely impacted even if the underlying data representation changes.

b_255 - Original Image - [0, 1, 2, 3, 4, .............. 251, 252, 253, 254, 255]

b_16 - Bucket range of 16 - [0, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 255]

b_5  - Bucket range of 5 - [0, 50, 100, 150, 200, 250]

No alt text provided for this image
No alt text provided for this image

Image formats do matter while saving them in bucket ranges. The difference between lossy and lossless compression needs to be understood. For example a format like jpg might not be able to store/restore the exact pixel values. Format like png will be better in case we want to study the impact of bucketizing exactly. If we are training the same million images in 2 different formats which are bucketized (jpg and png), do we get the same accuracy provided we keep all other parameters constant? The answer is no since the permutation and combination of pixel values you might have to deal in a jpg format might be higher than in a png format even though it is the same million images we have trained on using the same neural network topology with the same number of epochs. The outcomes are different with different image formats even though the information is exactly same from the perspective of a human eye.

What is the impact of bucketizing image pixels on image classification? The stats of data used for training and testing are given below. The images are randomly collected from internet and are not part of any standard dataset. The b_16 and b_5 images are of format jpg which will not be exactly a bucket range of 16 and 5 respectively. The results obtained here should be replicable even on different images and different image formats like png which is exact.

Number of classes (Training): 981, Total images (Training): 424594

Number of classes (Testing): 981, Total images (Testing): 107867

Neural Network Topology: resnet18

A series of experiment was conducted to answer some fundamental questions.

  1. What is the impact of bucketizing? Does information loss on images impact the image classification significantly or in proportion?
  2. What is the outcome of training an image on different bucket ranges and inferring on different bucket ranges?
  3. How does Top 1 accuracy and Top 5 accuracy compare with different bucket ranges?
  4. Is the difference between Top 5 accuracy narrowed down between original image and a bucket of 5 compared to Top 1 accuracy?
  5. Is bucketizing image pixels allow a robust mechanism against adversarial attacks since neural network deals with less permutation and combination in pixel values?
  6. Is bucketizing images brings similar benefits to post quantization of weights in terms of reducing the variation of weights of trained neural networks?

No alt text provided for this image

The above results shows that bucketizing does have an impact on accuracy (not generalization), but it is not in proportion to the amount of information loss due to bucketization. For example restricting each pixel value from [0 .... 255] to just 6 pixel value [0, 50, 100, 150, 200, 250] which is approximately 42.67:1 from information loss where as the accuracy reduction is hardly 2 to 3% maximum. Loss of large amounts of information due to bucketing does not lead to an equal reduction in accuracy.

Training a model using original image (b_255) and inferring them on bucketed images (b_16 or b_5) has major impact on accuracy. The reverse of training on bucketed images (b_16 or b_5) and inferring on original image (b_255) has lesser accuracy loss. The best accuracy is obtained always when training and inferring on same bucket size images. Training on b_5 and inferring on b_16 surprisingly reduces accuracy by large margins. Is this really because intersection of pixels [0, 50, 100, 150, 200, 250] and [0, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 255] is very limited. This brings in an interesting question. How sensitive are these neural networks to pixel values? Why did the model trained on b_5 perform worst in images of b_16? Is it really because intersection of pixel values are limited?

The Top 1 accuracy loss from b_255 to b_5 is slightly higher compared to that of Top 5 accuracy. This means that for most practical purposes, the accuracy loss is minimal compared to the information loss from the image.

Robustness against adversarial attack is important when such technologies are deployed in large scale. Bucketing reduces the permutation and combination of pixel values that a neural network has to understand. No experiment was conducted to understand if b_5 is more robust than b_255 or b_16 when it comes to adversarial attacks.

The concept of bucketizing image pixels are similar to post quantization of weights in some sense. No experiment was conducted to understand the impact of speed and accuracy of both concepts and how they compare in terms of accuracy.

We are in the early stages of understanding AI and its limitations. Like any other technology it has potential to influence future systems and the way humans interact and function. Today we tend to see two sets of people when talking about AI. One who is highly optimistic without knowing/acknowledging the limitations of current AI technology. The second set of people thinks that the hype around AI is high and an AI winter is coming. I would like to take an intermediate view of technology as opposed to either extremes. Fundamental understanding of any technology will allows us to improve it over a period of time. The article was one such to understand what is happening under the hood from image classification perspective. It is a hope that we will have lesser parameter models in the future which are more robust in production.

Great article, Rajeev M A . Good to see a quantifiable review on this topic. The phrase AI is more artificial and less intelligent catches the eye. In today's world , almost every other product claims to be driven by AI , without much substantiation of accuracy.

NIce one. Have not come across this term "bucketize" before especially in the image pixel context. It looks to be similar to the quantization of image pixel values. Any specific reason to prefer "bucketize" over "quantize"?

To view or add a comment, sign in

More articles by Rajeev M A

  • AI: Between Innovation, Hype, and Economic Reality

    We are living through one of the most exciting — and uncertain — phases in the history of technology. AI has gone from…

    8 Comments
  • Application of AI in Media Sector

    Deep Learning has increasingly found its place in the creative side of media, driving innovation across storytelling…

  • Bridging the Gap: Industry and Academia in AI/ML

    1. Intent Over the past decade, I have been actively engaged with academic institutions in a part-time capacity…

    11 Comments
  • Vibe Coding: Where the AI Magic Fizzles Out

    Introduction: There’s a lot of noise lately about how AI will soon write most of our code, leaving developers to merely…

    6 Comments
  • The 7R Model of AI Evolution: From Retrieval to Retroponitic

    Artificial Intelligence (AI) has been on an extraordinary journey of growth, evolving through distinct stages of…

    7 Comments
  • There is No Innovation Without an Invoice

    Introduction Success of technology depends on the value it adds to the business (consumer and enterprise). Some…

    5 Comments
  • Generative AI

    Generative AI, often referred to as GenAI, is a specialized subset of artificial intelligence dedicated to creating…

  • Applications of Artificial Intelligence in the Power Sector

    I had the privilege of speaking at the National Symposium on Emerging Technologies for Green Energy, an event organized…

    6 Comments
  • Stochasticity in Business Process

    Normally business processes are deterministic in nature. Rule based systems are fundamental part of any business…

    4 Comments
  • MLOps

    Why do many Machine Learning (ML) projects fail? Another way to look at it is, why many software projects fail? Can it…

Others also viewed

Explore content categories