From the course: Advanced Quantization Techniques for Large Language Models
Unlock this course with a free trial
Join today to access over 25,500 courses taught by industry experts.
Quantization error analysis
From the course: Advanced Quantization Techniques for Large Language Models
Quantization error analysis
Let's try to understand what quantization error is, how it affects information density, and how that error propagates through a neural network. First, let's focus on the core idea. When we quantize, we take real valued weights and activations and map them into a limited set of discrete values. Every time we do that, we introduce an error. You can think of this as adding a tiny bit of noise to every weight and every activation of the model. Individually, each error might be small, but across millions or billions of parameters, this noise can accumulate. We can study this error using the mean squared error or the maximum absolute error or signal to noise ratio, SNR. The goal isn't to eliminate it completely, but to reduce the error rates when we introduce different precision. As you can see here, information density changes as we quantize our model differently. For instance, when we move from an FP32 to an 8-bit or a 4-bit representation, we dramatically reduce the number of distinct…