The almost lost techniques of code performance optimization

Michelangelo Puliga

Published Feb 6, 2026

In the Cloud epoch when computing power seems abundant and inexpensive (tip: it is, however, expensive if you mis-use Cloud platforms), several gold rules of correct programming and profiling are being lost.

Terms such as "profiling", "memoization", "vectorization" are getting lost in the AI hangover. Need speed? Fire an EMR cluster on AWS, or switch on an EC2 machine with a zillion of RAM and infinite cores, move to languages that are fast and modern (Rust, Go) and start the computation. All of this is fantastic, sounds so modern, but hides untold complexities that will likely generate at the end of the month a punishing invoice.

My software dev career started in the glorious days of Fortran 77 (not even Fortran 90) with low-power computers, and a school (a master's degree in Theoretical Physics) that taught us to save even the last drop of computing power to speed up calculations. My professors, who even used perforated cards, used to write algorithms even to compute "SQRT" to a desired level of precision rather than firing the sqrt function from the C math.h libraries. Words such as numerical error, precision, memory size were things you could see in everyday work.

Those times taught me lessons that now, in the abundance day of modern AI, are becoming essential to save time, money, and efficiently boost performance. Like an archaeologist, I want to introduce you to some optimization techniques that, together with modern parallel computing, will save you from the burden (and hidden fees) of firing a cluster on a cloud platform.

Your modern, fast, and shining laptop has the same structure as the first personal computers: a centralized fast memory called "RAM" and several CPU cores that can elaborate instructions. Physical disk, even in the modern age of SSD and NVMe memories, is relatively slow and should be used as storage rather than active processing.

The first tool that can be so obvious as to be neglected is "memoization" - the technique of saving "already computed objects" along the way. Let's see how, in combination with another powerful technique called "quantization," it can save resources and accelerate computation.

Suppose we are computing a statistical measure (such as "average" or "standard deviation" ) on vectors V of 10 numbers varying in the range 0, 10 :

$ V = [ 1, 2, 2, 4, 9, 8, 7, 8,2, 1]
$ print (len(V))
10

suppose we have 100 of Vs, in a VS list of lists, a naive code to compute averages look like:

import numpy as np

averages = []
for V in VS:
    avg = np.mean(V)
    averages.append(avg)

numpy users with a minimum of experience would have solved this task by applying first vectorization.

import numpy as np

averages = []
V_matrix = np.vstack(VS) ## create a matrix 100x10 by stacking the vectors
averages = np.mean(VS, axis = 0) ## avg on the y axis

This operation is already much faster as numpy is optimized for computing over vectors. For simple functions as "mean" or "std" this trick works beautifully, and in 95% of cases it is enough.

But what if now we discover something interesting on our data, inspecting the vectors we notice something like several vectors (~ 40% of the total) that are identical after sorting

Recommended by LinkedIn

When Distributed Systems Meet Exascale Computing:…

Eyouel A Fantu 1 year ago

Exploring the Role of Mathematics in Distributed…

Malathy Saranya 2 years ago

🚀 Understanding Apache Spark — The Intuitive Way

Sanjeewa Rathnayake 6 months ago

$V10 = [1,2,2,3,4,4,5,5,7,8]
$V11 = [8,2,2,3,4,4,5,5,1,7]

$if(sorted(V10) == sorted(V11)):
$    print(True)

True

$if(np.mean(V10) == np.mean(V11)):
$    print(True)

True

we may take advantage of this situation as if two vectors are identical after sorting their averages are equal by mathematical definition of averaging (can you figure out this ?).

Memoization suggests us to avoid repeating the exact computation by keeping memory of past computations perhaps with a humble dictionary:

averages = []
memoization_dict = dict{}
for V in VS:
    sorted_V = sorted(V)
    if(V not in memoization_dict):
        avg = np.mean(V)
        memoization_dict[sorted_V] = avg
    else:
        avg = memoization_dict[sorted_V]
    averages.append(avg)

Now this snippet of code has eliminated duplicated operations (averages are computed only on new vectors) by keeping objects in the dictionary memoization_dict.

Performance gain in a trivial case like this one can be minor, but if the "average" operation was instead a slow function taking 1 hour of computing time we would got a dramatic reduction in computing time (reducing the time proportionally to the number of new operations).

Wait, it's great ! but you may say: "my data are never duplicated, they are defined in the continuous domain like"

0 < 0.1, 0.123, 1.244, ..., < 10.

in a case like this we can first try quantization. What if we convert

1.244 -> 1.2 and 0.123 to 0.1

just rounding to the first decimal position (or even converting to int). If we are OK by scarifying a bit of precision in our final results we can quantize data, and now we are back on track for memoization as

1.22, 0.13 --> 1.2, 0.1 that is equivalent to 0.14, 1.233 --> 0.1, 1.2

we can try to save computing time by a smart combination of quantization and memoization.

In conclusion before firing a cluster, and risking bankrupt for the invoice, check your data, try to understand their inner structure, and consider old but gold techniques such as memoization and quantization. Work smart, spend less, learn more.

Johnny Da Silva 2mo

Great exploration of ancient computing techniques in a technological landscape dominated by AI, Cloud, and the emergence of quantum. Even as quantum computing reaches its applied phase by 2026, with IBM and Google already demonstrating exponential gains in areas like portfolio optimization and molecular simulation, the role of classical and hybrid methods remains foundational. Decision-makers across finance, pharma, and logistics are recognizing that classic deterministic algorithms and on-premise architectures offer robust cost control and compliance advantages, especially as quantum systems present current limitations with qubit instability and high cooling costs. For CIOs and CTOs, the strategic move is hybrid: mapping about 80 percent of workloads to classical computing for daily operations while piloting quantum on the 20 percent of problems that defy traditional scaling. Meanwhile, migrating towards post-quantum cryptography and modernizing legacy applications with cloud-native approaches can drive both resilience and readiness as quantum adoption accelerates. In this context, revisiting proven, resource-efficient computing not only reduces cloud dependency and operational expenditure but also gives organizations a more flexible runway to integrate next-gen tech as it matures. This blend of old and new is becoming a critical driver for sustainable growth and technological sovereignty in the years ahead.

The almost lost techniques of code performance optimization

Michelangelo Puliga

Recommended by LinkedIn

More articles by Michelangelo Puliga

Others also viewed

NuNets Computational Model

Practical - convert Azure BatchAI to Azure Machine Learning services

Cross pollination

Prompt Caching — A Deep Technical Explanation

From Chess Boards to Open Source

Overview of Amazon Bracket with a Practical Example

Distributed Systems Overview using Stacked Assumption Relaxation and Constraint Introduction Framework [v1.0]

OpenAI Cut $800 Billion and Hired One Person. The Second Decision Matters More.

Map-reduce or All-reduce in 2021

Parallel computing challenge

How to Improve Code Performance

Tips for Cloud Optimization Strategies

Memory Optimization Strategies

How to Optimize Machine Learning Performance

Explore content categories

Recommended by LinkedIn

More articles by Michelangelo Puliga

AI Agents into the wild

The software downfall

Others also viewed

NuNets Computational Model

Practical - convert Azure BatchAI to Azure Machine Learning services

Cross pollination

Prompt Caching — A Deep Technical Explanation

From Chess Boards to Open Source

Overview of Amazon Bracket with a Practical Example

Distributed Systems Overview using Stacked Assumption Relaxation and Constraint Introduction Framework [v1.0]

OpenAI Cut $800 Billion and Hired One Person. The Second Decision Matters More.

Map-reduce or All-reduce in 2021

Parallel computing challenge

Similar topics

How to Improve Code Performance

Tips for Cloud Optimization Strategies

Memory Optimization Strategies

How to Optimize Machine Learning Performance

Explore content categories