Speed up C++ code with ref qualifiers
What are ref qualifiers, and why should I use them?
The ref qualifiers were added in C++11, as an excellent addition to move semantics and rvalue references. In a nutshell, the main benefit of this feature is the ability for the compiler to bind a specific method for rvalues and another one for lvalues. In this way, you can move the object's internal state instead of copying it. Ok, enough talking. Let's see the code.
Example
Imagine a wrapper class for a blob array representing a neural network, like the one below. The only method, getBlob, returns a reference to the internal blob.
class NeuralNetwork
{
public:
NeuralNetwork() : blob{1024*1024*8, 255} {}
const vector<int>& getBlob() { return blob; }
private:
vector<int> blob;
};
auto localBlob = NeuralNetwork.getBlob();
The copy constructor of the vector class will be called to create and initialize the localBlob variable. Since the source (ret value of getBlob) is an rvalue, copying the blob doesn’t make much sense. Instead, we could move it into localBlob, by modifying the getBlob method accordingly:
vector<int>&& getBlob() { return move(blob); }
The only problem is that we have to call the optimized implementation of getBlob just for the rvalues, we don’t want to move data from lvalues. Ref qualifiers come into play:
class OptimizedNeuralNetwork
{
public:
NeuralNetwork() : blob{1024*1024*8, 255} {}
vector<int>& getBlob() & { return blob; }
const vector<int>& getBlob() const& { return blob; }
vector<int>&& getBlob() && { return move(blob); }
private:
vector<int> blob;
};
As you can see, in the OptimizedNeuralNetwork, there is a dedicated overload of the getBlob method for rvalues, the last method in the above snippet. It is called only for rvalue instances of the NeuralNetwork class. Using this implementation, the move constructor is used to instantiate localBlob variable, avoiding the unnecessary copy of blob member.
Recommended by LinkedIn
Benchmarks
But how fast it’s the rvalue specialized version of getBlob compared to the first implementation that returns constant reference?
In order to answer this question, let’s imagine there is a ParseBlob method that takes the blob data from a neural network and extracts some relevant metadata:
template<typename T
void ParseBlob(T&& arg)
{
auto local = forward<T&&>(arg).getBlob();
//Parse blob and extract relevant data
}
We call the ParseBlob function with instances from both classes (NeuralNetwork and OptimizedNeuralNetwork) and measure the runtime for each one:
ParseBlob(NeuralNetwork{});
ParseBlob(OptimizedNeuralNetwork{});
Here is what the runtime performance looks like:
As you can see, the optimized version of the NeuralNetwork class more than doubles the performance (85us vs 28us). And all these, just by using the ref qualifiers together with move semantics.
Always try to design your APIs in such a way that rvalues have a specialized implementation. It all comes down to moving resources from rvalues instead of copying them, and this optimization has become such an important part of modern C++ that you should take it into consideration.
Thank you for your time! Any questions or suggestions are more than welcome.