Parallel Algorithms (C++17)
C++17 introduced a significant update to the Standard Library by adding parallel versions of many existing algorithms and introducing new ones designed for parallel execution. These algorithms leverage the power of multi-core processors to potentially achieve significant performance improvements compared to their sequential counterparts.
What Are C++17 Parallel Algorithms?
C++17 extended the existing STL algorithms (e.g., sort, transform, reduce) by adding support for parallel execution policies. These policies allow algorithms to automatically use:
Key Headers
#include <algorithm> // For algorithms like sort, transform
#include <execution> // For execution policies: par, seq, par_unseq
Execution Policies
The key to using these parallel algorithms is specifying an execution policy as the first argument. This policy guides the algorithm's execution and allows it to utilize parallelism. C++17 defines three standard execution policies:
Compilation and Execution
On Linux (GCC):
g++ -std=c++17 -O2 -ltbb My_code.cpp -o My_program
C++17 Execution Policy Defaults
In C++17, execution policies are optional. If you omit them:
#include <vector>
#include <algorithm>
int main() {
std::vector<int> data = {5, 3, 2, 6, 1, 4};
// Defaults to sequential execution (no parallelism)
std::sort(data.begin(), data.end()); // Same as std::sort(std::execution::seq, ...)
}
Types of Parallelism
The C++17 parallel algorithms can leverage different types of parallelism:
Here's an example demonstrating the use of std::execution::par for parallelizing a finding the max element in the big dataset:
Recommended by LinkedIn
EX1:
#include <iostream>
#include <vector>
#include <algorithm>
#include <execution>
int main() {
std::vector<int> data(1000000, 1);
// Using parallel execution policy to fill the vector
int cnt = 0;
std::for_each(std::execution::par, data.begin(), data.end(), [&](int& n) {
cnt++;
if(cnt == 9999) {
n = 100;
}
n *= 2;
});
// Using parallel execution policy to find the maximum element
auto max_elem = std::max_element(std::execution::par, data.begin(), data.end());
std::cout << "Max element: " << *max_elem << "\n";
return 0;
}
/*
Max element: 200
*/
Ex2 : Parallel Sort
#include <vector>
#include <algorithm>
#include <execution>
int main() {
std::vector<int> data = {5, 3, 2, 6, 1, 4};
// Parallel sort using multiple CPU threads
std::sort(std::execution::par, data.begin(), data.end());
// Result: {1, 2, 3, 4, 5, 6}
}
Full Example: Parallel Word Count
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <execution>
using namespace std;
int main() {
vector<string> words = {"apple", "banana", "apple", "cherry"};
vector<int> counts(words.size(), 0);
// Count occurrences of "apple" in parallel
transform(execution::par,
words.begin(), words.end(), counts.begin(),
[](const std::string& word) {
return (word == "apple") ? 1 : 0;
});
int total = reduce(execution::par,
counts.begin(), counts.end(), 0);
cout<<"Total count = "<<total;
// total = 2
}
// Total count = 2
SIMD Vectorization with par_unseq
The par_unseq policy allows the compiler to generate SIMD instructions (e.g., AVX on x86 CPUs) for data-level parallelism. This is automatic and depends on:
Here's an example demonstrating the use of std::execution::par_unseq for parallelizing a vector transformation:
par_unseq: The compiler/runtime may use SIMD (e.g., AVX instructions) to process multiple elements at once.
#include <iostream>
#include <vector>
#include <algorithm>
#include <execution>
int main() {
std::vector<int> data(1000000);
std::iota(data.begin(), data.end(), 1); // Fill with values 1 to 1000000
// Parallel transformation using par_unseq
std::transform(std::execution::par_unseq, data.begin(), data.end(), data.begin(),
[](int x) { return x * x; });
// Print the first few elements to verify
for (int i = 0; i < 5; ++i) {
std::cout << data[i] << " ";
}
std::cout << std::endl;
return 0;
}
/*
1 4 9 16 25
*/
Supported Algorithms
Not all STL algorithms support parallelism. Common parallelizable algorithms include:
Considerations
// UNSAFE (race condition):
int counter = 0;
std::for_each(std::execution::par, data.begin(), data.end(), [&](int) {
counter++; // Race condition!
});