CPU Code modernisation – Stream HPC's hidden expertise

CPU Code modernisation – Stream HPC's hidden expertise

You’ve seen the speedups of 100’s to 1000’s of times. We all know that the lion share of the techniques would also work on modern multi-core CPUs, where GPUs get the last 2x to 8x only. When it’s 8x, the GPU is the obvious choice. When it’s 2x, would the better choice be a bigger CPU or a bigger GPU?

Now AMD has launched their 32 core CPU, the answer to that question changes. Not only because of the 32 cores but also because of the 256bit vector-computations via AVX2. This means that each clock-cycle 32 double4’s can be calculated on. A 16-core AVX1 CPU could work on 16 double2’s, which is only a fourth of that performance.

Intel reacted immediately by hinting they will also launch a 32-core Xeon. Meanwhile IBM works on launching their quad-threaded 24-core Power9 CPU. Cavium is providing 64bit 64-core ARM processors, which also need many threads to keep them busy. Not only core-numbers increase, but the interconnect standards now all push for upgrades while HBM seeks a way outside GPUs.

CPUs have reborn.

We will discuss the advantages of these CPUs in upcoming blog posts.

Algorithm and CPU optimisations

Where performance optimisation (aka “code modernisation”) often is described as “applying tricks”. Having a specialisation of making easily-readable code that performs, we know better. With CPUs taking all that makes GPU fast, we now can also apply GPU-specific optimisations in the CPU-domain.

An example of where we used the CPU in a project was with NooSpheer. The initial goal was to use the GPU, but we found that the CPU was faster for that given algorithm. The 40,000x speedup was on a 8-core CPU with AVX, and the code is expected to run much faster on the CPUs described above.

“Stream is an elite, dependable and unique development outfit. We utilized Stream to achieve a ~40,000x speedup for our quantum simulation software. If your project demands ultra fast design and robust implementation, work with them.” -- Jordan Ash, CEO Noospheer

 The multi-core CPU is dead. Long live the multi-multi-core CPU!

In 2012 I wrote on CPUs with embedded GPUs. That was a big change and defined a complete new type of CPU. 20+ cores is not so big, but still it defines a new type of processor. This means a split: 4 or 8 cores for desktops, laptops and mobiles, and 20+ cores for shared servers and HPC.

You might ask, why only now? This is due to better (and more accepted) virtualisation techniques and (thanks to GPUs) more code that’s optimised for handling data-parallel workloads. Another reason is Intel’s monopoly in the server-market, that is now heavily under attack by ARM, IBM and AMD.

These modern, 20+ core CPUs are very welcomed, as they make the CPU-GPU gap smaller.

As you might have guessed, programming a 20+ core CPU is different from programming a 4-core CPU. Luckily we have a long experience in building scalable software, that optimally runs on quad-core CPUs to high-end GPUs. When CPUs are chosen as only target, code can be kept in the same language. And that is much-requested, even if the code has to be mostly rewritten to get to the new performance and quality goals.

--

Vincent Hindriksen is managing director at Stream HPC.

and of course High End Compute can give independent advice too

  • No alternative text description for this image
Like
Reply

To view or add a comment, sign in

More articles by Vincent Hindriksen

  • Location-based default printer on a MAC

    (Possibly this works on Linux too, but did not test it yet) When you work from home and the office often, then it gets…

    1 Comment
  • Redash login via Keycloak SAML

    Keycloud is a generic identity provider, so it's easier to centrally handle users for all services in use. Redash is…

    9 Comments
  • How to get a job abroad

    While applying locally can be done more randomly, finding a #job abroad is luck or a long-term strategy. In other…

  • Job interviews - getting to round 3

    Every now and then I see comments on LinkedIn and Twitter how bad job application processes can be. "They did not even…

    2 Comments
  • Stream HPC is expanding

    After being in business for almost 8 years, it seems we get noticed. Nothing changed really, just kept the focus on…

    1 Comment
  • Bug fixing the MESA 3D drivers

    Most of our projects are around performance optimisation, but we’re cleaning up bugs too. This is because you can only…

    1 Comment
  • Why all gets open sourced at AMD?

    Not all are aware how much has been open sourced at AMD. And there are several reasons for this.

    1 Comment
  • Solving the "Limited Computer-Simulation Size" Problem

    Say you have developed a new mathematical model to better solve you industry's problems. After a mathematical proof of…

  • When to use Artificial Intelligence and when to use Algorithms?

    The main strength of Artificial Intelligence is it's easy to understand by anybody. This results in new applications in…

  • Customer: "So you also do full projects?"

    One of these moments when you find out that the company is not seen as I want it to be seen. Compared to generic…

    1 Comment

Others also viewed

Explore content categories