Parallel computing in a nutshell

Victor E Cardoso

Published Aug 11, 2014

During the last 20 years, a set of programming tools have been developed and maintained by thousands of experts, which can be used for free to make awesome applications. Learn about these technologies and exploit your potential.

Nowadays, every programmer should knows how to assemble a parallel program in order to excel in a highly connected and competitive world, where the efficiency is the first priority in most of computational applications, such as games, physical simulations, industrial applications, mathematical computing, augmented reality, computer vision, robotics, cinematography renders, customized prothesis, intelligent houses, medical tools and many others scientific and industrial requirements which aims to enhance the human experience and extends its life.

The art of making efficient computing programs is known as High Performance Computing (HPC), which has as main objective to reduce the computing time with the least amount of resources, such as memory and instructions. This can be achieved by software techniques or by hardware optimizations (or both). In one hand, the hardware optimizations for HPC are mainly developed and promoted by manufacturers, in the other hand, the software best practices are proposed by scientific communities.

The modern computers are formed by a Random Access Memory (RAM) and a Central Processing Unit (CPU). The CPU could have more than one core, the cores are independent processors by themselves. The RAM is the memory space where the instructions of a program read and save data most of the time. The CPUs have a super fast memory embedded known as Cache, the correct use of this memory speed up the execution of programs. A computer program is composed by sequences of instructions known as threads, a serial program executes a queue of threads, while a parallel program could execute several threads at the same time. Each core could execute multiple threads, but it is recommendable to give only one thread to each one. Most of the modern computers are equipped with a Graphics Processing Unit (GPU), some GPUs could be programmed to execute thousand of threads at the same time, these are known as General Purpose GPUs (GPGPUs).

If you are thinking in developing a parallel program, I recommend you to use C programming language, since you can exploit your hardware better than any other language with it, but if you are used to code in an object oriented paradigm, you could use as well C++, but keep in mind that the tools I'm going to explain in the following paragraphs are developed to be used from C, so you will need to use some binding which encapsulates their functionality into C++ classes or mix your code with object functions and C-like routines.

There are five main ways to develop a parallel program:

Using shared memory (One CPU with multiple cores using the same RAM).
Using distributed memory (Multiples CPUs with their own RAM).
Using a GPGPU.
Creating an hybrid with shared and distributed memory.
Creating an hybrid using CPUs and GPGPUs.

The shared memory scheme allows the accessing of multiple threads to the same memory space, the RAM. This scheme is common in the modern desktops, laptops and notebooks, which have a single processor with multiple

cores and a single RAM. These programs could be written using POSIX threads or OpenMP, the first option is implemented in the library pthreads and allows a high control over every thread of your program, you should use pthreads until you domain the C programming language and have a well understanding about your hardware and operating system. For beginners, I recommend OpenMP, which is easier to use and allows to generate programs with the same efficiency than pthreads for most of the cases.

The distributed memory scheme is used in clusters of computers, when you need to perform computations in several machines with their own RAM.

This scheme is commonly used for huge problems, where the work must be divided into many parts to be completed in a reasonably amount of time. The best option to develop a program with this scheme is the Message Passing Interface (MPI), a standardized programming interface to send and receive data between computers.

The GPGPUs are mainly used to perform the same computation over different data, such as in computer graphics, where the same calculations are executed over all the pixels, which could be thousands of millions.

The GPGPUs has shown their efficiency in several scientific applications, such as the N-body problem, the Fourier transform and some medical applications. The most widely used programming interfaces for GPGPUs are CUDA and OpenCL. CUDA is maintained by Nvidia, while OpenCL is developed by a community of scientist and experts of the industry. The GPGPUs can not access the RAM, instead they have to use their own embedded memory.

A programming expert could exploit these technologies to develop hybrid programs in order to solve specific problems. Nothing prevent us to create a "farm" of computers with the best CPUs and a "farm" of computers with the best GPGPUs working together to reach some "unattainable" objective, such as the complete simulation of an animal cell, the complete map of the connections between neurons of a human brain, or finding the cure of some cancer, just imagine the possibilities.

The goal of this first article is to serve as a quick guide to start with parallel computing and give an overview of the HPC world. Comments and observations are welcome.

To view or add a comment, sign in

Parallel computing in a nutshell

Victor E Cardoso

More articles by Victor E Cardoso

Others also viewed

Alan Kay and the Future of HMI: When Interfaces Become Thought Tools

[Scientific Computing] Why A Hybrid MPI/OpenMP Programming Strategy?

What Software Tool Chain Should I Use for my SDR Research?

Coming on December 3, 2015

CPU Programming Series: IA-32 Assembly in Practice 32-bit Stack-Based ABI & Interoperability

A Masterclass in LabVIEW Architectures: Mastering Actor Framework and DQMH from First Principles

⚡ Parallel Programming in HPC — The Brain Behind Modern Computing

Why Multi-Core CPUs Are Still More Powerful Than the Software We Write — And Where Parallel Programming Stands Today

How six 'girls' programmed the first all-electronic computer — and set the course for modern technology

Programming GPUs – Part 3: CUDA Code Compilation and Synchronization

Explore content categories

More articles by Victor E Cardoso

Android: The big picture

Pyramid of developer needs

Good programming has gone to hell

5 nested levels can cause brain damage

8 tips every C/C++ developer should know about memory.