Data-Parallel Types – A First Example

Rainer Grimm

Published Jun 30, 2025

After providing a theoretical introduction to the new C++ 26 feature in my last article, “Data-Parallel Types (SIMD),” I would like to follow up today with a practical example.

The following introductory example is from the experimental implementation of the SIMD library. This functionality has been fully adopted in the C++ 26 draft under the name Data-parallel types (SIMD). To port the program to the C++ 26 standard, it should be sufficient to replace the header <experimental/simd> with <simd> and the namespace std::experimental with std::datapar.

#include <experimental/simd>
#include <iostream>
#include <string_view>
namespace stdx = std::experimental;
 
void println(std::string_view name, auto const& a)
{
    std::cout << name << ": ";
    for (std::size_t i{}; i != std::size(a); ++i)
        std::cout << a[i] << ' ';
    std::cout << '\n';
}
 
template<class A>
stdx::simd<int, A> my_abs(stdx::simd<int, A> x)
{
    where(x < 0, x) = -x;
    return x;
}
 
int main()
{
    const stdx::native_simd<int> a = 1;
    println("a", a);
 
    const stdx::native_simd<int> b([](int i) { return i - 2; });
    println("b", b);
 
    const auto c = a + b;
    println("c", c);
 
    const auto d = my_abs(c);
    println("d", d);
 
    const auto e = d * d;
    println("e", e);
 
    const auto inner_product = stdx::reduce(e);
    std::cout << "inner product: " << inner_product << '\n';
 
    const stdx::fixed_size_simd<long double, 16> x([](int i) { return i; });
    println("x", x);
    println("cos²(x) + sin²(x)", stdx::pow(stdx::cos(x), 2) + stdx::pow(stdx::sin(x), 2));
}

Before I proceed with the program, I would like to introduce the output.

First, I would like to focus on the println and my_abs functions. The println function outputs the name and content of a SIMD vector, iterating through its elements. my_abs calculates the absolute value of each element in a SIMD vector with integers, using where to conditionally negate negative values.

The main function is much more interesting. In the SIMD vector a, each element is set to 1, whereas in the SIMD vector b, thanks to the lambda function, each element is initialized so that it has its index minus 2. By default, SSE2 instructions are used via const stdx::native_simd. These SIMD vectors are 128 bits in size. Now the arithmetic begins. Vector c is the element-wise sum of a and b, d is the element-wise absolute value of c, and vector e is the element-wise square of d. Finally, stdx::reduce(e) is used. This reduces vector e to its sum.

The expression const stdx::fixed_size_simd<long double, 16> x([](int i) { return i; }) is particularly interesting. It initializes the SIMD vector x with 16 long double values from 0 to 15. This is possible if the architecture is sufficiently modern and supports AVX-252. This applies, for example, to Intel’s Xeon Phi or AMD’s Zen 4 architecture. Similarly interesting is the line println(“cos²(x) + sin²(x)”, stdx::pow(stdx::cos(x), 2) + stdx::pow(stdx::sin(x), 2)). This calculates cos²(x) + sin²(x) for each element, which is 1 for all elements due to the trigonometric identity of Pythagoras. All functions in <cmath> except for the special mathematical functions for simd are overloaded. These include basic functions such as abs, min, and max. However, exponential, power, trigonometric, hyperbolic, and gamma functions can also be applied directly to SIMD vectors.

Now I would like to go into more detail about the width of the data type simd<T>.

Width of simd<T>

The width of the data type native_simd<T> is determined by the implementation at compile time. In contrast, the developer specifies the width of the data type fixed_size_simd<T>.

Recommended by LinkedIn

The Unsung Hero of Transformers: Why the Feed-Forward…

Sugumaran Balasubramaniyan 3 months ago

Introduction to Error Detection and Correction #2:…

Simon Southwell 3 years ago

Splitting a long string in lines efficiently

Daniel Lemire 7 months ago

The class template simd has the following declaration:

template< class T, class Abi = simd_abi::compatible >
class simd;

Here, T stands for the element type, which cannot be bool. The Abi tag determines the number of elements and their memory.

There are two aliases for this class template:

template< class T, int N >
using fixed_size_simd = std::experimental::simd<T, std::experimental::simd_abi::fixed_size<N>>;
		
template< class T >
using native_simd = std::experimental::simd<T, std::experimental::simd_abi::native<T>>;

The following ABI tags are available:

scalar: storing a single element
fixed_size: storing a specified number of elements
compatible: ensures ABI compatibility
native: most efficient
max_fixed_size: maximum number of elements guaranteed to be supported by fixed_size

What’s next?

After this initial example of data parallel types, I would like to take a closer look at their functionality in the next article.

Ravichandran Paramasivam 10mo

Great Read. Thanks Rainer. 😊 Why use Data-Parallel types(SIMD)? Performance: Exploits vector registers in CPUs. Portability: Abstracts platform-specific vector intrinsics. Ease of Use: Cleaner than writing AVX/NEON intrinsics directly. Safe Fallback: Falls back to scalar loops if hardware lacks SIMD.

1 Reaction

To view or add a comment, sign in

Data-Parallel Types – A First Example

Rainer Grimm

Width of simd<T>

Recommended by LinkedIn

What’s next?

More articles by Rainer Grimm

Others also viewed

Today in IT History: When “Personal” Meant You Soldered It Yourself

RFabric is a modular hardware-agnostic signal runtime framework engineered for structured protocol interaction and experimental RF system design.

The Physics of Context: Why “RAG Is Dead” Is an Architectural Fallacy

Code Optimisation - What's all the fuss about?! - Part 2

What is std::simd in C++26?

Crystalfield Processor Architecture

Developing a VVC Using the UVVM VVC Generator

The Great Async Divide - Part 2: Comparing Zig 0.16 and Rust’s Async Models

Your Model Has No Idea What Came First - Unless You Tell It

Von Neumann Architecture: A Deep Dive into the Blueprint of Every Computer You've Ever Used

Explore content categories

Width of simd<T>

Recommended by LinkedIn

What’s next?

More articles by Rainer Grimm

Charity run for ALS

Small Safety Improvements in the C++ 26 Core Language

My ALS Journey (30/n): Cippi at the CppCon

Contracts: Evaluation Semantic

My ALS Journey (29/n): I feel Good

Contracts: A Deep Dive

My ALS Journey (28/n): Bureaucracy – The German Disease

Data-Parallel Types: Algorithms

My ALS Journey (27/n): An Emergency Call

Data-Parallel Types: Reduction

Others also viewed

Today in IT History: When “Personal” Meant You Soldered It Yourself

RFabric is a modular hardware-agnostic signal runtime framework engineered for structured protocol interaction and experimental RF system design.

The Physics of Context: Why “RAG Is Dead” Is an Architectural Fallacy

Code Optimisation - What's all the fuss about?! - Part 2

What is std::simd in C++26?

Crystalfield Processor Architecture

Developing a VVC Using the UVVM VVC Generator

The Great Async Divide - Part 2: Comparing Zig 0.16 and Rust’s Async Models

Your Model Has No Idea What Came First - Unless You Tell It

Von Neumann Architecture: A Deep Dive into the Blueprint of Every Computer You've Ever Used

Explore content categories