Keras Image Preprocessing: scaling image pixels for training
Looking to image preprocessing example in Keras, you often see image is scaled down by factor 255 before feeding to the model.
test_datagen = ImageDataGenerator(rescale=1./255)
Referencing from image Keras ImageDatagenerator source code, the parameter rescale is to multiply every pixel in the preprocessing image.
rescale: rescaling factor. If None or 0, no rescaling is applied, otherwise we multiply the data by the value provided (before applying any other transformation).
The question raised is why rescale is 1./255 and why we need this before training neural network
From above 8-bits grayscale image, every digital image is formed by pixel having value in range 0~255. 0 is black and 255 is white. For colorful image, it contains three maps: Red, Green and Blue, and all the pixel still in the range 0~255. (note:, pixel value ranges according to the storage size, the pixel range is 0~2^bits)
Since 255 is the maximin pixel value. Rescale 1./255 is to transform every pixel value from range [0,255] -> [0,1]. And the benefits are:
- Treat all images in the same manner: some images are high pixel range, some are low pixel range. The images are all sharing the same model, weights and learning rate. The high range image tends to create stronger loss while low range create weak loss, the sum of them will all contribute the back propagation update. But for visual understanding, you care about the contour more than how strong is the contrast as long as the contour is reserved. Scaling every images to the same range [0,1] will make images contributes more evenly to the total loss. In other words, a high pixel range cat image has one vote, a low pixel range cat image has one vote, a high pixel range dog image has one vote, a low pixel range dog image has one vote... this is more like what we expect for training a model for dog/cat image classifier. Without scaling, the high pixel range images will have large amount of votes to determine how to update weights. For example, black/white cat image could be higher pixel range than pure black cat image, but it just doesn't mean black/white cat image is more important for training.
- Using typical learning rate: when we reference learning rate from other's work, we can directly reference to their learning rate if both works do the scaling preprocessing over images data set. Otherwise, higher pixel range image results higher loss and should use smaller learning rate, lower pixel range image will need larger learning rate.
Note 1: Removing the mean-value per example might also help feature learning . Note 2: Batch normalization might help also
<3
Thanks for sharing such knowledge. You did well in explaining the purpose of rescaling the images.
How appropriate of a post. I was just talking with a bunch of Lego people about wanting to make photo mosaics. The big problem is acquiring the vast number of either 1x1s or 2x2 and deciding on short or tall. But here with the one can add the dimension of a color swap so you can make crazy colors like the artist Warhol.