Data-Parallel Types – A First Example
After providing a theoretical introduction to the new C++ 26 feature in my last article, “Data-Parallel Types (SIMD),” I would like to follow up today with a practical example.
The following introductory example is from the experimental implementation of the SIMD library. This functionality has been fully adopted in the C++ 26 draft under the name Data-parallel types (SIMD). To port the program to the C++ 26 standard, it should be sufficient to replace the header <experimental/simd> with <simd> and the namespace std::experimental with std::datapar.
#include <experimental/simd>
#include <iostream>
#include <string_view>
namespace stdx = std::experimental;
void println(std::string_view name, auto const& a)
{
std::cout << name << ": ";
for (std::size_t i{}; i != std::size(a); ++i)
std::cout << a[i] << ' ';
std::cout << '\n';
}
template<class A>
stdx::simd<int, A> my_abs(stdx::simd<int, A> x)
{
where(x < 0, x) = -x;
return x;
}
int main()
{
const stdx::native_simd<int> a = 1;
println("a", a);
const stdx::native_simd<int> b([](int i) { return i - 2; });
println("b", b);
const auto c = a + b;
println("c", c);
const auto d = my_abs(c);
println("d", d);
const auto e = d * d;
println("e", e);
const auto inner_product = stdx::reduce(e);
std::cout << "inner product: " << inner_product << '\n';
const stdx::fixed_size_simd<long double, 16> x([](int i) { return i; });
println("x", x);
println("cos²(x) + sin²(x)", stdx::pow(stdx::cos(x), 2) + stdx::pow(stdx::sin(x), 2));
}
Before I proceed with the program, I would like to introduce the output.
First, I would like to focus on the println and my_abs functions. The println function outputs the name and content of a SIMD vector, iterating through its elements. my_abs calculates the absolute value of each element in a SIMD vector with integers, using where to conditionally negate negative values.
The main function is much more interesting. In the SIMD vector a, each element is set to 1, whereas in the SIMD vector b, thanks to the lambda function, each element is initialized so that it has its index minus 2. By default, SSE2 instructions are used via const stdx::native_simd. These SIMD vectors are 128 bits in size. Now the arithmetic begins. Vector c is the element-wise sum of a and b, d is the element-wise absolute value of c, and vector e is the element-wise square of d. Finally, stdx::reduce(e) is used. This reduces vector e to its sum.
The expression const stdx::fixed_size_simd<long double, 16> x([](int i) { return i; }) is particularly interesting. It initializes the SIMD vector x with 16 long double values from 0 to 15. This is possible if the architecture is sufficiently modern and supports AVX-252. This applies, for example, to Intel’s Xeon Phi or AMD’s Zen 4 architecture. Similarly interesting is the line println(“cos²(x) + sin²(x)”, stdx::pow(stdx::cos(x), 2) + stdx::pow(stdx::sin(x), 2)). This calculates cos²(x) + sin²(x) for each element, which is 1 for all elements due to the trigonometric identity of Pythagoras. All functions in <cmath> except for the special mathematical functions for simd are overloaded. These include basic functions such as abs, min, and max. However, exponential, power, trigonometric, hyperbolic, and gamma functions can also be applied directly to SIMD vectors.
Now I would like to go into more detail about the width of the data type simd<T>.
Width of simd<T>
The width of the data type native_simd<T> is determined by the implementation at compile time. In contrast, the developer specifies the width of the data type fixed_size_simd<T>.
Recommended by LinkedIn
The class template simd has the following declaration:
template< class T, class Abi = simd_abi::compatible >
class simd;
Here, T stands for the element type, which cannot be bool. The Abi tag determines the number of elements and their memory.
There are two aliases for this class template:
template< class T, int N >
using fixed_size_simd = std::experimental::simd<T, std::experimental::simd_abi::fixed_size<N>>;
template< class T >
using native_simd = std::experimental::simd<T, std::experimental::simd_abi::native<T>>;
The following ABI tags are available:
What’s next?
After this initial example of data parallel types, I would like to take a closer look at their functionality in the next article.
Great Read. Thanks Rainer. 😊 Why use Data-Parallel types(SIMD)? Performance: Exploits vector registers in CPUs. Portability: Abstracts platform-specific vector intrinsics. Ease of Use: Cleaner than writing AVX/NEON intrinsics directly. Safe Fallback: Falls back to scalar loops if hardware lacks SIMD.