Mitigating branch mis-predictions in callback processing
As we know that most modern processors are equipped with sophisticated branch prediction algorithms which facilitates efficient Instruction level pipelining and Speculative execution. However, occasionally the processor mis-predicts a branch, and this leads to something called “pipeline flush”, which effectively is the CPU discarding the results of all the instructions that it executed previously and fetching and executing the correct order of instructions. As is apparent, this is an expensive operation.
Now, as C++ developers we often come across scenarios wherein, we have two separate modules, with one module generating the callbacks and the other processing them. If one is developing an ultra-low latency system, and the processing of callback is in the hot-path, one needs to ensure that frequent branch mispredictions do not happen.
Commonly, in any system, we have several different types of callback messages which needs to be processed. To handle this, we often write conditional code i.e., an if-else branch. In unpredictable systems, wherein any of the several messages can be generated any time, this can lead to a high degree of branch misprediction by the processor, which can adversely affect the overall performance.
To mitigate this problem, we can write code that does not have any branches, and processor just executes the desired code.
Below are simple code examples, the first one uses if-else branch and the second one is the branchless version.
Example 1: With if-else branch
Example 2 : without branches
Output :
Good work Vivek! This makes the code more extensible as well, like adding 3rd message is as simple as adding new ProcessMessage structure. Happy to see 'requies' expression in action. Would it be possible to have this kind of framework (code) into a separate dll and clients of that dll can extend it e.g. adding 3rd message?