The Sum of All Inference
Generative AI [1] is a breakthrough of our time -- or maybe it's hype, depending on your perspective and expectations. But let's assume the former. Currently the largest AI models have more than 50 layers and over 200 billion parameters [2], needing 100s of GPUs and days or even weeks to train. That's kinda Ok as language and source codes aren't changing that fast. After all, humans take years to train, and at what expense lord only knows !
But training is one half of an AI model. Like any computer program, an AI model has to run - this is called inference. To give some perspective on large model inference, consider the following:
Clearly, something is wrong with this picture. The human brain takes 40 Watts, not thousands of Watts, and 1L of water lasts a day, not 5 minutes. On our current path, if our best AI model data scientists continue to make breakthroughs for the next 10 years, they won't approach even a fraction of human intelligence.
So how will we satisfy society's need for massive inference ? Here are some predictions. Mark it down, you heard these here first:
-1000x more capacity than what we have today
-based on variations of "content accessible" addressing (CAM [7]), instead of a math-based address space, with huge address ranges, on the order of petabytes or higher
-slow access time, on the order of msec (that's millisecond, 3x slower than we use today)
-no EDAC [8] circuitry, which will be replaced by computational neural net connections; i.e. smart connections instead of "dumb weights". Errors will be considered unimportant and possibly even beneficial
-extremely low cost - no exotic materials
Does this start to sound vaguely familiar ? Yes, something like the billion billions of neurons and synapses in the human brain.
Recommended by LinkedIn
4. Training will not require "gradient descent", "simulated annealing", or other mathematically complex techniques. Nothing like this is happening in the brain - indeed, the brain has nowhere near the required level of power and error-free calculation. Instead, training will be based on structural data relationships - organization, proximity, pathways - along with persistence (repetition and forgetfulness) that support CAM methods. Unfortunately for Nvidia, complex math calculations will not be needed
5. Established semiconductor companies other than the Big 3 will be incentivized by CHIPS Act (and future government legislation) funding to develop combined CPU and neural memory devices. For example, Texas Instruments possesses archived technology that still outperforms Nvidia and Intel in "processing density" (the ratio of performance over chip package size and power consumption), even 8 years after they shelved it. In addition to funding, the government could deny TI waivers to sell to China analog and discrete devices unless they partner with a memory manufacturer and re-enter the inference device market. Other candidates for government intervention include QualComm and Analog Devices
We can't say when and we can't say who, but what we can say is the semiconductor entity - startup, established, or government sponsored consortium - that figures out neural memory will be the first "big AI", this century's success story.
[1] Generative Pre-Trained Transformer (GPT) Large Language Models (LLMs) and Large Programming Models (LPMs)
[2] In AI, extremely high numbers of layers and parameters underlies the term "deep", as in deep learning or deep neural networks (DNNs)
[3] https://gizmodo.com/chatgpt-ai-water-185000-gallons-training-nuclear-1850324249
[4] Meltdown - literally, to avoid melting the solder attaching CPU and GPU cores to their circuit board
[5] ASIC - application specific integrated circuit
[6] https://www.wsj.com/articles/rising-data-center-costs-linked-to-ai-demands-fc6adc0e
[7] Generally known as CAM - content addressable memory
[8] EDAC - error detection and correction