The law of parallel processing

Fawad A. Qureshi

Published Sep 11, 2023

Have you ever seen someone write bad code and try to solve the performance problem by throwing more hardware at it? Higher-level languages like Python and Java make it easier to write code, but they also make it easier to write bad code. This is because these languages do not force programmers to think about the performance of their code in the same way that lower-level languages like Assembly and C do. As a result, programmers who use higher-level languages can sometimes be lazy and write code that is not as efficient as it could be

In a recent article, I discussed the following scalability graph:

Let's discuss how you get parallel efficiency gain. A formula known as Amdahl's Law formulated by former IBM engineer Gene Amdahl discusses that the code's theoretical speedup is directly proportional to the least amount of work that can be parallelized.

Why are some technologies more optimized than others? Let’s find out. Parallel processing is the single biggest driver of computing efficiency. Imagine you land at an airport and have to move your luggage to the taxi rank.

Imagine it takes 1 person 1 minute to move one trolley from the carousel to the taxi rank. If you have 1 trolley, for 10 people, it will take 10 minutes to complete. This is serial processing.
If you have ten trolleys for ten people, it will only take 1 minute to complete the task. This achieves the maximum parallel efficiency of 10X.
If five people insist on using the same red trolley, they must do their tasks sequentially. No matter how many free trolleys are available, completing them will always take 5 minutes. This results in a maximum parallel efficiency of 2X even though idle resources are available.

Recommended by LinkedIn

Understanding Concurrency and Parallelism: A Guide…

David Zhu 11 months ago

🚀 Mastering Parallelism and Asynchrony in C#: A…

Wagner Hernandes 5 months ago

Are Data Structures and Algorithms obsolete today?

Taral Pawar 4 years ago

Let's examine this analogy in the world of parallel databases:

Here is an example of three different types of databases processing SQL (Above picture credit Daniel Graham ). Imagine we have 100TB of data to analyze.

On the left is a serial processing database. It chugs along at the speed of a single server, usually only a single CPU on that server. It is the slowest method. It is like having a single trolley for everyone at the airport.
In the middle, we see some parallel processing where multiple servers work together to do the table scans, joins, sorting, and oops. It doesn’t do sorting in parallel. That’s a massive bottleneck in performance when terabytes of data are involved. This is like a few people insisting on the red trolley. This means resources are waiting in idle. No matter how many nodes you add to the configuration, you cannot speed it up unless you eliminate the "red trolley."
On the far right is the max efficiency system, which does everything as much as possible in parallel. This will be the fastest solution. To go faster, we add more servers to the cluster.

In short, the more parallel efficient your system is, the better would be the system's throughput. A simple understanding of this principle can help you in writing better code. Always, the maximum parallel efficiency of your code is determined by the least amount of code that can be parallelized.

If you like, please subscribe to the FAQ on Data newsletter and/or follow Fawad Qureshi on LinkedIn.

Daniel Graham 2y

One of those pictures looks familiar. Good explanations too... Thanks Fawad.

1 Reaction

Boris Mogilevsky 2y

Great article, thanks

2 Reactions

Marco Ullasci 2y

Thanks Fawad A. Qureshi for the article. It must be noted for the readers less familiar with the subject that some computations are intrinsically serial in nature or have an intrinsic serial fraction that can't be helped no matter how smart and mature is the database optimizer. I'd like to also point out that sometimes there are approximated algorithms that can have a better efficiency and yet be good enough for certain user cases. I recall Vertica, for example, offered both an exact and an approximate "count distinct" with very different performance profiles.

The law of parallel processing

Fawad A. Qureshi

Recommended by LinkedIn

More articles by Fawad A. Qureshi

Others also viewed

The Last Human in the Loop Was Writing the Skill Files. Not Anymore.

Gen AI will replace me soon

Data Science at Scale on bp's On-Prem Supercomputer (Part 3)

Rosseta Stone - A Polyglot Matrix

Why I Abandoned My 400-Star AI Architecture for a Text File (The Evolution to LLM OS)

Unicode and Modern C++ Correct Text Handling from C++20 to C++26 Understanding Text Encoding in Modern C++: Reality, Limitations, and Solutions

RAG Is Dead: Why Vector Search Fails on Massive Codebases

Use ASN.1 and Protocol buffers for efficient binary serialisation

Understanding Protocol Buffers: More Than Just gRPC

How to Optimize Data Serialization

How to Improve Code Performance

When to Use Parallel Programming in Software Development

Why Scalable Code Matters for Software Engineers

Explore content categories

Recommended by LinkedIn

More articles by Fawad A. Qureshi

Why the Cheapest Data Platform Is a Trash Can: And what that says about your data strategy

What should we teach our children? The human skills that outlast automation

What You Cannot Outsource: The parts of life AI should never replace

The Half-Life of Skills Is Shrinking: Why learning, unlearning, and relearning are now part of the job

The Compound Effect of Showing Up:Why consistency on LinkedIn creates leverage when it matters most?

The Quiet Work: Progress that happens when no one is watching

The Pizza Call That Explains AI’s Limits: Why AI Still Needs Humans in Customer Service?

Infrastructure Planning Is a Giant Chess Puzzle: How spatial constraints, underground infrastructure, and GeoAI shape modern urban planning

Be a Kharpainch: Why adaptability and not perfection, will define success in the Age of AI?

When the Room Gets Bigger, the Craft Changes: What stand-up comedy taught me about presenting at scale

Others also viewed

The Last Human in the Loop Was Writing the Skill Files. Not Anymore.

Gen AI will replace me soon

Data Science at Scale on bp's On-Prem Supercomputer (Part 3)

Rosseta Stone - A Polyglot Matrix

Why I Abandoned My 400-Star AI Architecture for a Text File (The Evolution to LLM OS)

Unicode and Modern C++ Correct Text Handling from C++20 to C++26 Understanding Text Encoding in Modern C++: Reality, Limitations, and Solutions

RAG Is Dead: Why Vector Search Fails on Massive Codebases

Use ASN.1 and Protocol buffers for efficient binary serialisation

Understanding Protocol Buffers: More Than Just gRPC

Similar topics

How to Optimize Data Serialization

How to Improve Code Performance

When to Use Parallel Programming in Software Development

Why Scalable Code Matters for Software Engineers

Explore content categories