Beyond the GPU: Why Neuromorphic Computing Chips may be the Next Imperative for Physical AI Neuromorphic computing, long an academic curiosity, is finally beginning to cross the chasm into real AI infrastructure. It is the primary model that merges memory and compute to overcome the “Von Neumann bottleneck,”;making it a fundamental enabler of real-time "Physical AI." Neurotrophic chips mimic the human brain’s architecture, where processing and memory are inextricably linked and computation is event-driven (spiking only when necessary). This allows for milliwatt-level operation, always-on sensory processing, and real-time adaptation for high-speed robotics and autonomous mobility. We are currently in an exponential deployment phase with lab-proven prototypes and early development kits. With market valuations reflecting rapid growth, the technology is moving beyond the "experimental" phase with a promise of becoming become a staple of energy-efficient AI, particularly for edge applications. In terms of implementation, neuromorphic systems are not intended to replace CPUs or GPUs entirely. Instead, they are being integrated as specialized co-processors. This architectural split allows the system to offload inference-heavy, low-latency tasks to the neuromorphic chip while maintaining the host CPU/GPU for higher-level logic. The Industry Landscape The ecosystem is currently bifurcated between established semiconductor giants and specialized startups delivering edge silicon. • Intel: Remains a dominant force, maintaining leadership with the Loihi series, which continues to serve as a benchmark for Spiking Neural Network (SNN) development. • BrainChip: A leader in early commercialization, delivering the Akida architecture, which is specifically optimized for production-ready, ultra-low-power edge AI acceleration. • SynSense: Capturing significant market share by specializing in vision-based neuromorphic processors, highly optimized for robotics and dynamic vision sensing (DVS). • Emerging Innovators: Startups such as Innatera (spiking neural processors for sensors), Grayscale AI (neuromorphic-powered robotics), and Polyn Technology are rapidly filling niche market gaps, particularly in sensor-driven and autonomous edge applications. The Bottom Line: By 2030, neuromorphic computing could transition from a specialized "edge co-processor" to the default substrate for all autonomous and mobile AI systems. Within the next five years, we will see the emergence of "heterogeneous brain-on-a-chip" architectures where neuromorphic cores are integrated into standard SoC designs. This shift will make persistent, real-time "Physical AI" ubiquitous for autonomous devices without requiring a Data Center to power them.
Emerging Processor Architectures
Explore top LinkedIn content from expert professionals.
Summary
Emerging processor architectures refer to the rapidly evolving designs of computer chips that power artificial intelligence and other complex workloads, focusing on specialized roles, energy efficiency, and novel ways to handle data. These new architectures combine traditional CPUs with GPUs, NPUs, neuromorphic chips, and advanced packaging to meet the diverse needs of modern computing.
- Prioritize workload matching: Select the processor architecture best suited for each specific task, like using GPUs for deep learning or NPUs for energy-efficient on-device AI.
- Embrace integration: Explore systems that combine CPUs, GPUs, and specialized accelerators to improve performance and handle multiple workloads efficiently.
- Monitor new technologies: Stay updated on advances like chiplets, 3D stacking, and neuromorphic chips, as they are transforming how AI and data processing are achieved.
-
-
CPU shipments are larger than GPGPU shipments even amid the datacenter AI boom. Over ten million CPUs are shipped annually compared to about half that number for GPUs. Of course, the installed CPU base is vastly larger. But GPGPUs are ramping up fast, and it is natural to assume they will be harnessed to do more work. The recent investments and implied partnership between NVIDIA and Intel remind us of the opportunities for innovation to optimize CPU-GPU platforms for mutual benefit. Memory management is likely to be an early opportunity. Traditionally CPUs functioned as hosts and treated GPGPUs as I/O peripherals with their own memory space. CPUs copy kernels and their data to GPU memory and copy the results back after the kernel finishes execution. These transfers are carefully managed to overlap kernel execution and keep the GPU as busy as possible. More recently, enhancements allowed the CPU to access device side memory with a common pointer on a zero-copy basis. Now in the Grace Hopper system, the CPU and GPU can share a single page table and virtual address space. An Address Translation Service Translation Buffer Unit (ATS-TBU) provides fast translations and supports interaction between all MMUs and TLBs on the system. All CPU cores and GPU processors function as multicores in a multiprocessor system. These refinements are useful for programmer productivity. Their performance varies depending on memory access patterns. Explicitly managed memory does better when there is a lot of data reuse within the GPU kernel. ATS shows better performance when data is accessed on the CPU frequently and GPU reuse is minimal. So, if the GPU workload is restricted to large accelerator kernels, then the traditional paradigm shines. ATS benefits workloads where the access patterns are more complex and granular than sequential reads and writes which could well be the case for emerging CPU-GPU co-operative workloads. One can implement optimizations to improve performance both in hardware and drivers including batching of page faults, intelligent prefetching etc. This is just one example of co-optimization that can enhance performance and productivity of tightly coupled CPU-GPU workloads of the future and how CPUs might best evolve. CPUs might reconsider the plethora of vector instructions and hardware which replicate what GPGPUs could do. Heretical as it may sound even power efficiency may be less important than improving the peak performance of latency sensitive code sections that GPUs cannot do (better branch predictors, more speculation ….). Evolution selects not the strongest or fittest but those best adapted to their environment. CPUs of the future will have a lot of GPUs around them. Chip to chip interconnects like NVLink open myriad possibilities for the development of heterogeneous applications mixing CPU and GPU computations. It might be a competitive advantage to recognize this early and research and optimize aggressively for this.
-
🔵Tokyo Electron : The future of AI hardware will be defined by the convergence of physical scaling and heterogeneous integration. Transistor innovation alone is no longer enough. System performance now comes from co-optimizing logic, memory, interconnect, and advanced packaging as a unified architecture. GAA and CFET push logic scaling forward. Backside PDN improves power delivery. 4F² VCT and 3D DRAM continue density scaling. Yet the real breakthrough comes when everything is integrated: GPU/CPU cores surrounded by HBM, connected through 3DIC structures, and supported by ultra-flat wafers, known-good dies, and high-efficiency heat spreaders. This is the new era of AI semiconductors. The bottleneck has shifted from transistor count to how fast we can move data, stack memory, reduce thermal resistance, and pack heterogeneous functions into one compute engine. The next performance leaps won’t come from one domain. They will come from cross-domain integration. SemiVision
-
New AI Processors Architectures Balance Speed With Efficiency Leading AI systems designs are migrating away from building the fastest #AI processor possible, adopting a more balanced approach that involves highly specialized, heterogeneous compute elements, faster data movement, and significantly lower power. Part of this shift revolves around the adoption of #chiplets in 2.5D/3.5D package. These new designs also take aim at #NVIDIA’s near monopoly in the AI world due to the proliferation of inexpensive GPUs and the #CUDA-based models built atop of them. NVIDIA is well-aware of these competitive threats, of course, and the company certainly is not standing still. Its new #Blackwell chip combines GPUs with a CPU and DPU, and its quantization scheme opens the door to low-precision AI in addition to its blazing fast training capabilities, which are needed to handle much larger data models. #IBM’s new Telum processor includes a mix of a data processing unit (DPU) for I/O acceleration — basically, funneling data to where it will be processed and stored — as well as innovative caching. In all, it contains 8 cores running at 5.5 GHz, 10 36-megabyte L2 caches, and a new accelerator #chiplet. #Intel likewise introduced its next-gen accelerator chip for AI training, the Gaudi 3, which features 4 deep learning cores (DCOREs), 8 HBM2e stacks, a matrix multiplication engine that is configurable as opposed to programmable. #AMD’s MI300X chips is based on a distributed AI system comprised of 12 chiplets, with 4 I/O dies and 8 accelerator dies. its fourth-generation Infinity fabric, PCI Express Gen 5, HBM3, and CDMA3 architecture, MI300X provides balanced scaling across compute, memory, and I/O subsystems.” AI is just becoming useful, but there are still challenges ahead. But chips are only one piece of the solution. Sustainability also requires more efficient software, improvements in micro-architectures so that large language model queries occur less frequently, and increasingly precise responses so that LLM responses can be trusted. In addition, it will require tighter integration of specialized processing elements in the form of chiplets, which are capable of processing different data types faster and more efficiently. https://lnkd.in/giUWDMwH.
-
AI is not powered by one chip. Different workloads need different architectures. Training a model, running inference, handling edge devices, or generating tokens all demand different strengths. That is why understanding compute matters more than ever. Here are 5 AI compute architectures explained 👇 𝗖𝗣𝗨 Best for general-purpose computing, control logic, sequential tasks, and low-latency operations across diverse workloads. 𝗚𝗣𝗨 Built for parallel processing and massive matrix calculations. Ideal for deep learning training and high-throughput AI workloads. 𝗧𝗣𝗨 Specialized for tensor operations and large-scale model training. Optimized for efficient AI acceleration at scale. 𝗡𝗣𝗨 Designed for energy-efficient on-device inference. Common in smartphones, laptops, cameras, and embedded systems. 𝗟𝗣𝗨 Focused on fast LLM inference with deterministic, low-latency token generation for language model execution. What This Means: The future of AI infrastructure is not one winner. It is choosing the right processor for the right job. Smart AI teams optimize workloads across architectures instead of forcing every task onto the same hardware. Which architecture do you think will matter most over the next 3 years?
-
I have always been fascinated by the way the brain manages complexity with such elegance. Its ability to balance speed, energy efficiency, and adaptability still surpasses what current machines can deliver—but neuromorphic computing is narrowing that gap. Inspired by neural dynamics, this emerging class of hardware processes information in parallel, consumes exceptionally low energy, and adapts in real time. The shift goes beyond performance gains. It suggests a new paradigm for computation—moving away from rigid sequential logic toward fluid, brain-like interaction. These systems are still in development, but the potential is considerable. From edge devices that make autonomous decisions to AI models that evolve on the fly without retraining, their applications could extend across fields like healthcare, robotics, and beyond. We often track progress by looking at teraflops or model sizes. But perhaps now is the moment to also reflect on how biologically inspired architectures may guide us toward more intelligent and sustainable digital systems. #NeuromorphicComputing #AI #EdgeAI #FutureOfComputing
-
Just finished reading this open-access survey paper—a well-organized overview on the current and emerging landscape of DL hardware accelerators. Title: A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms By Cristina Silvano and her team. Published in: ACM Computing Surveys, June 2025 It covers a wide spectrum: ◯ GPUs, TPUs, ASICs, FPGAs ◯ RISC-V-based AI chips ◯ In-memory & near-memory computing (IMC/PIM) ◯ Sparse matrix accelerators ◯ Neuromorphic, photonic, and quantum architectures It gives a solid snapshot of how DL hardware is evolving—focused on energy efficiency, sparsity, and memory-centric architectures. 📘 Read the full paper (open access): https://lnkd.in/gQEvqCt8 Highly recommended for anyone in AI hardware, VLSI design, or HPC research. #AIHardware #DeepLearning #HPC #RISCV #InMemoryComputing #Accelerators #VLSI
-
RISC-V: Will It Become the Next Big Thing in Processors? Today, many newspapers and blogs talk about RISC-V. It is an open-source CPU architecture that anyone can use without paying license fees. In fact, many microcontrollers (MCUs) already use RISC-V, especially in devices that are small, low-cost, or do not need very complex software. But the main question is: Can RISC-V replace Arm processors such as Cortex-A53 or Cortex-A57, which are very common in application processors? For now, the answer is no. High-performance Arm processors like Cortex-A7x or Cortex-X2 are still the main choice in smartphones and automotive systems. RISC-V is growing fast, but it is not yet strong in these advanced markets. 1. Misunderstanding About RISC-V: “It’s Free” It is true that RISC-V is open source and royalty-free. But this does not mean it is completely free for companies. Companies must hire RISC-V experts and invest in R&D. With Arm, companies get support and documentation. With RISC-V, they must solve problems on their own. Because of this, many chipset vendors only use RISC-V for prototypes or demo projects, not for final commercial products. 2. RISC-V Ecosystem Compared to Arm The RISC-V ecosystem is still very young compared to Arm, which has been used for decades. SiFive is doing good work, especially in adding support to the Linux kernel. However, many developer tools and libraries are not fully ready yet. It will take more years before RISC-V reaches the same level of maturity as Arm. 3. Why Many Embedded Developers Do Not Learn RISC-V I often meet embedded developers who already know Arm architecture very well. When I give seminars about RISC-V, they usually listen but do not study it deeply. The common reasons are: “We don’t have enough time to learn a new architecture.” “We can already finish our projects using Arm knowledge.” In short, they think learning RISC-V is extra work without clear benefit. Final Thoughts Most of my career has been with Arm processors. But I believe the future will need developers who understand both Arm and RISC-V. Arm is still powerful in mobile and automotive. RISC-V is growing fast in IoT, MCUs, and experimental products. If you want to grow as an embedded developer, you should start learning RISC-V now. Even if it is not the industry standard yet, having skills in both Arm and RISC-V will make you more competitive in the global job market.
-
Most chips still rely on instruction sets controlled by two companies. That bottleneck may be why RISC-V is gaining global momentum. For decades, building a processor meant licensing an architecture. Usually from Intel or ARM. That model works. But it also limits who can experiment with new designs. This is where RISC-V changes the conversation. Because the architecture is open, universities and labs can design processors without asking permission first. Switzerland has quietly become one of the most interesting places to watch this shift. Researchers at ETH Zurich have built 75 RISC-V chips over the past decade. Some of those designs show up to 100× efficiency gains for AI and machine learning workloads. That matters more than ever. AI systems and data centers are pushing power demand higher every year. The industry is now chasing something simple but difficult: more compute with far less energy. A few signals worth paying attention to: • 75 experimental chips developed in Switzerland • 100× efficiency gains in some AI workloads • 4,500+ companies and institutions now involved in the RISC-V ecosystem That scale is why some researchers compare RISC-V to “the CERN of semiconductors.” An open environment where new computing ideas can actually be tested. And the bigger question is starting to emerge. If chip architecture becomes open infrastructure… who shapes the next generation of processors? Curious what others think. Do you see RISC-V becoming a serious challenger to ARM and x86 over the next decade? #Semiconductor #RiscV #ChipDesign #AIInfrastructure #SupplyChain #DataCenters #EdgeComputing #OpenSourceHardware #AIChips
-
🔴 Georgia Institute of Technology and Samsung Semiconductor show the path beyond #SRAM, in #NatureReviews #ElectricalEngineering. The #AI #memory #wall is real, and SRAM is hitting the brakes. We are building AI accelerators with massive compute power (#TPUs, #NPUs), but we are starving them of data. The traditional approach, relying on #6T-SRAM for on-chip buffers, is hitting a physical limit. It’s too big, it leaks too much power, and it can’t scale with the exploding size of AI models. It’s time to look at High-Speed Emerging Memories that can actually keep up with systolic array architectures. 🔴 1. The #Density Problem SRAM cells are huge (~150-200 F²). If we want more on-chip memory without making the chip size (and cost) impossible, we need a new architecture. The Fix: Gain-Cell embedded DRAM (2T/3T). It ditches the capacitor and uses transistors to store charge. The result? Much higher density and logic compatibility. It’s the smart density we need for global buffers. 🔴 2. The #Leakage Nightmare AI inference involves a lot of standby time for weights. SRAM leaks power constantly. The #Magnetic random-access memory (MRAM) is emerging as a game-changer. The Fix: Non-Volatility. Technologies like Spin-Transfer Torque (#STT)-#MRAM and Spin-Orbit Torque (#SOT)-#MRAM offer near-zero leakage. SOT-MRAM, in particular, has the endurance and speed to replace L1/L2 caches, not just storage. 🔴 3. The #SystemLevel Win The study benchmarks these memories on a TPU-like architecture. The conclusion? Replacing SRAM with Gain-Cell embedded DRAM or SOT-MRAM significantly improves energy efficiency and area efficiency. Less off-chip access to #HBM means faster inference and lower power bills. 👇 Link in the comments #AIHardware #Semiconductors #EmergingMemory #MRAM #eDRAM #NatureReviews #ComputerArchitecture #NANDFlash #FeNAND #3DNAND #OxideSemiconductor #Ferroelectrics #AIInfrastructure #AIPower #EnergyEfficiency #Semiconductor #MemoryTechnology #HighK #LowK #Fabrication #SamsungElectronics #TechInnovation #HBM #Engineer #Ferro #Dielectric #HZO #Oxide #Gate #Cap #FET #nm #Moore #Depo #CVD #ALD #PEALD #PECVD #Furnace #MemoryWindow
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development