Parallel and distributed computation in C++17

Parallel and distributed computation in C++17

Parallel and distributed computation in C++17 can be achieved using a number of libraries and frameworks, such as OpenMP, MPI, and Boost.Compute.

Here's an example of how to perform a matrix multiplication using parallel computation with OpenMP:


```c++

#include <iostream>

#include <vector>

#include <omp.h>


void matrixMultiplication(std::vector<float>& A, std::vector<float>& B, std::vector<float>& C, int width) {

  #pragma omp parallel for

  for (int row = 0; row < width; ++row) {

    for (int col = 0; col < width; ++col) {

      float sum = 0;

      for (int i = 0; i < width; ++i) {

        sum += A[row * width + i] * B[i * width + col];

      }

      C[row * width + col] = sum;

    }

  }

}


void printMatrix(std::vector<float>& matrix, int width) {

  for (int i = 0; i < width; ++i) {

    for (int j = 0; j < width; ++j) {

      std::cout << matrix[i * width + j] << " ";

    }

    std::cout << std::endl;

  }

}


int main() {

  const int width = 1024;

  const int size = width * width;


  // Allocate memory on the host  std::vector<float> h_A(size);

  std::vector<float> h_B(size);

  std::vector<float> h_C(size);


  // Initialize matrices

  for (int i = 0; i < size; ++i) {

    h_A[i] = i % width;

    h_B[i] = i % width;

  }


  // Perform matrix multiplication in parallel

  matrixMultiplication(h_A, h_B, h_C, width);


  // Print the result

  printMatrix(h_C, width);


  return 0;

}

```


In this example, the `matrixMultiplication` function represents the matrix multiplication operation that will be executed in parallel using OpenMP. The `#pragma omp parallel for` directive is used to parallelize the outer loop of the matrix multiplication operation, allowing the computation to be executed concurrently on multiple threads.


On the other hand, here's an example of how to perform the same matrix multiplication using distributed computation with MPI:


```c++

#include <iostream>

#include <vector>

#include <mpi.h>


void matrixMultiplication(std::vector<float>& A, std::vector<float>& B, std::vector<float>& C, int startRow, int endRow, int width) {

  for (int row = startRow; row < endRow; ++row) {

    for (int col = 0; col < width; ++col) {

      float sum = 0;

      for (int i = 0; i < width; ++i) {

        sum += A[row * width + i] * B[i * width + col];

      }

      C[row * width + col] = sum;

    }

  }

}


void printMatrix(std::vector<float>& matrix, int width) {

  for (int i = 0; i < width; ++i) {

    for (int j = 0; j < width; ++j) {

      std::cout << matrix[i * width + j] << " ";

    }

    std::cout << std::endl;

  }

}


int main(int argc, char** argv) {

  MPI_Init(&argc, &argv);


  int rank, size;

  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  MPI_Comm_size(MPI_COMM_WORLD, &size);


  const int width = 1024;

  const int size = width * width;


  // Allocate memory on the host

  std::vector<float> h_A(size);

  std::vector<float> h_B(size);

  std::vector<float> h_C(size);


  // Initialize matrices

  if (rank == 0) {

    for (int i = 0; i < size; ++i) {

      h_A[i] = i % width;

      h_B[i] = i % width;

    }

  }


  // Broadcast input matrices

  MPI_Bcast(h_A.data(), size, MPI_FLOAT, 0, MPI_COMM_WORLD);

  MPI_Bcast(h_B.data(), size, MPI_FLOAT, 0, MPI_COMM_WORLD);


  // Compute matrix multiplication using distributed computation

  const int rowsPerProcess = width / size;

  const int startRow = rank * rowsPerProcess;

  const int endRow = startRow + rowsPerProcess;

  std::vector<float> h_partC(rowsPerProcess * width);

  matrixMultiplication(h_A, h_B, h_partC, startRow, endRow, width);


  // Gather output matrix

  MPI_Gather(h_partC.data(), rowsPerProcess * width, MPI_FLOAT, h_C.data(), rowsPerProcess * width, MPI_FLOAT, 0, MPI_COMM_WORLD);


  // Print the result on the root process

  if (rank == 0) {

    printMatrix(h_C, width);

  }


  MPI_Finalize();


  return 0;

}

```


In this example, the `matrixMultiplication` function represents the matrix multiplication operation that will be executed using distributed computation with MPI. The input matrices are broadcasted to all processes using `MPI_Bcast`, and the output matrix is gathered on the root process using `MPI_Gather`. The computation is divided into equal-sized chunks of rows, with each process executing the matrix multiplication operation on its assigned rows.


Overall, parallel computation in C++17 involves breaking up computations into smaller tasks that can be executed concurrentlyon multiple threads or cores on the same machine, using libraries like OpenMP. Distributed computation in C++17 involves breaking up computations into smaller tasks that can be executed concurrently on multiple processes running on different machines, using libraries like MPI. The choice between these approaches depends on the nature of the problem being solved and the available resources for computation. Additionally, C++17 introduced new features for parallel and concurrent programming, such as the `<execution>` header and the `std::jthread` class. These features provide more options for implementing parallel and concurrent algorithms in C++17.

To view or add a comment, sign in

Others also viewed

Explore content categories