CPUs vs. GPUs: Two Philosophies for the Same Computation Problem

CPUs vs. GPUs: Two Philosophies for the Same Computation Problem

The CPU (Central Processing Unit) and GPU (Graphics Processing Unit) are the main compute engines found in nearly every modern device. It’s often said that CPUs handle general computing tasks and excel at serial (step-by-step) processing, while GPUs are specialized for tasks that require large-scale parallelism. But why is there such a difference?

Today, GPUs are everywhere—driving breakthroughs in artificial intelligence, scientific computing, and graphics. Their success is behind the current surge in LLMs (Large Language Models) and the phenomenal growth of companies like NVIDIA. Interestingly, CPUs and GPUs share many common architectural components (cores, memory hierarchies, control units), yet their fundamental differences are rooted in how they tackle the challenge of keeping their compute resources busy.

Let's try to dig deeper into it.

What is the problem?

Every processing unit wants to stay busy i.e. constantly performing work (calculations). But to do that, it needs data. Imagine a chef eager to cook, but unable to start until the ingredients arrive. For CPUs and GPUs, these "ingredients" are delivered from memory, and two crucial factors limit efficiency:

  1. Memory Bandwidth: How much data can flow to the processing unit in a second?
  2. Memory Latency: How long it takes from requesting data until it arrives governed by physics, distance, and circuitry design.

Consider a simple example, say the compute unit can process 2000 GFLOPs (giga floating point operations per second), and the memory bandwidth is 200 GBps or 25 GigaFloat/sec (using double-precision 1 Float = 8 byte). Also, assume each memory access takes about 90 nanoseconds (ns) to arrive.

How do we keep our compute unit busy?

In computer science, compute intensity measures for every number that is loaded from memory how many operations need to be done by the compute unit to break even. In other words, it is the amount of work device needs to do to account for the fact memory can't feed it as fast as it needs to be fed.

For our example,

Compute intensity here would be = 2000 / 25 = 80        

Thus for every load done, processing unit needs to do atleast 80 operations to break even. This is too much of ask from the processing unit and difficult to achieve by many algorithms.

So, question arises how to increase the efficiency of processing units under the constraint of memory bandwidth and memory latency?

Solution

One simple (but challenging) approach is to increase memory bandwidth or reduce memory latency. Adding more FLOPs (cores) is easier, but boosting memory speed is limited by physics and cost.

Suppose from previous example, bandwidth is 200 GBps and latency is 90 ns; that means up to ~18,000 bytes can be transferred during each latency window. If an algorithm only loads 2 bytes per window, that's a massive waste. We need strategies to make full use of available bandwidth and latency so that there is always some data to work on.

CPU and GPU take these two different routes to solve this - CPU tries to cut latency and GPU tries to hide latency. Cutting latency mean trying to make the processing unit spend less time on waiting data and hiding latency mean finding way to keep it busy even when waiting.

This is fundamental design principle between CPU and GPU.

  • CPU is latency-oriented: It has fewer threads in order of tens and hundreds designed to minimize the time from command to result for a single or handful of tasks (control-heavy: lots of decisions and branches with unpredictable execution path).
  • GPU on the other hand is throughput oriented: It has so many threads in order of hundreds and thousands designed for oversubscription of tasks (data-heavy: repetitive and predictable operations on lots of data). So, some threads are always ready to run thus keeping it busy.

So in our chef analogy, CPU makes waiting shorter for each chef and GPU hires more chef so someone is always working.

Comparison Summary

Article content

Conclusion

CPUs are optimized for minimizing task waiting time (latency), excelling in unpredictable and control-heavy work. GPUs are designed for keeping the whole chip busy (throughput), making them perfect for predictable, highly parallel workloads. Understanding this distinction reveals why both are critical in modern computing — and why we need both to power the AI and software of tomorrow.

At end, I leave you with two questions to ponder upon 1/ Why not just add more cores to CPU and 2/ Why aren't GPUs general-purpose CPUs?




Absolutely, GPUs are the real engine behind the AI revolution. At Zortex Computers, we’ve seen how their massive parallel processing power enables efficient training and real-time inference for large language models. Without GPUs, today’s breakthroughs in generative AI and deep learning simply wouldn’t be possible.

To view or add a comment, sign in

More articles by Ajit Kumar

Others also viewed

Explore content categories