Machine Learning Hardware

“If you were plowing a field, which would you rather use: two strong oxen or 1,024 chickens?”. The thoughts of Seymour Cray still holds good in a world where one hardware architecture does not dominate processing all types of workloads. At times two strong oxen are wining and other times 1024 chickens are winning. The same is the case with machine Learning (ML) / Deep Learning (DL) workloads. No single hardware architecture is able to dominate this field.

The computational requirements of ML / DL are high since we are trying to identify patterns in a dataset which could be infinite. The data undergoes transformation prior to pattern identification followed by inference of learned model against data not seen in the past. In ML terminology it is called pre processing, model building, & inference at a high level. Computational requirements are much higher for model building followed by pre processing and then inference. Model building is an iterative process and requires many iterations prior to arriving at a generalized model which works fine with unseen scenarios. The entire process of ML might have more steps than the three mentioned. From the perspective of computational requirements, these three steps constitute the major chunk.

There are multiple ways to increase the throughput of a ML pipeline. Software optimization is the first step and many software frameworks are being altered to perform better in heterogeneous hardware's. Better utilization of cores in a multi core CPU (Xeon Skylake or Xeon Cascade Lake) is one such example. Other techniques include mixed precision compute to increase the throughput. Use of FP32 or FP64 is not required all the time and we can use low precision to achieve more throughput in some scenarios. OpenVINO toolkit is an example of highly optimized framework for inference on CPU, integrated graphics, & FPGA. Hardware acceleration is the other alternative. For example specialized hardware for vector processing like GPU or TPU.

The data used for ML can be represented as scalar, vector, matrix, & spatial. The matrix can be thought of as row vector or column vector. It is difficult to find a unified hardware architecture which can process all this forms of data representations equally well. CPU's are strong for processing scalar with extensions for vectors through Advanced Vector Extensions (AVX). GPU's are more ideal for vector processing compared to CPU's. That is why they are used as accelerators in some ML / DL pipelines. The spatial data is better processed using FPGA. The hardware options today we have are

  • CPU
  • GPU (Integrated Graphics & Discrete Graphics)
  • FPGA
  • ASIC / ASSP / SoC
  • TPU
  • VPU
  • OPU
  • IPU
  • ......

The hardware options listed above are mostly used in conjunction with CPU (except few) and acts as an accelerator to CPU to speed up computation. CPU is the most general purpose among them. CPU offloading certain type of workload to the accelerator is the most common scenario. Mostly the software frameworks use the most appropriate hardware you have in your system with minor changes to code. It is clear that heterogeneous computing is here to stay and a better understanding of is needed to exploit the best out of hardware. Some of the hardware listed above are under research and development.

Model building is the most computational intensive step with high level of iteration involved. Depending on the size of data and computational requirements we can either use accelerators on a single system or build model using vertical scaling (more accelerators) or distributing the workload across multiple systems. There are many examples of HPC / Distributed architectures used for ML / DL training with varied success based on different workloads. There are many criteria's for choosing the right hardware as part the machine learning pipeline. A small list of criteria's are as follows.

  1. Which stage of ML are we dealing with? Pre processing, model building, & inference. A hardware ideal for model building might not be the ideal one for inference.
  2. How much data are you processing? What happens if data doesn't fit into memory?
  3. What type of data are you processing? Scalar, Vector, Matrix, or Spatial.
  4. Which ML / DL framework are you using? Some frameworks are more optimized for certain hardware architecture.
  5. Is the solution deployed at edge or cloud? Power usage and latency are a great concern at edge.
  6. Are you concerned with the power consumed or low latency or heat generated during inference?
  7. Are you willing to compromise speed for accuracy using techniques like mixed or low precision compute?
  8. Are you a researcher who wants to experiment quickly to arrive at alternate algorithms quickly or a user who wants to use an existing algorithm?
  9. What type of algorithms do you use? Some algorithms perform better on certain hardware.
  10. What is your throughput requirements? For example 30 frames per sec or similar measures.
  11. Is your team knowledgeable enough to work with frameworks like CUDA / OpenCL / ROCm to take further advantage of hardware if needed?
  12. Do you expect other workloads other than ML / DL to be run on the same hardware?
  13. How many concurrent workloads are you planning to run in the same hardware?
  14. Are you looking for on-premise computing or are fine with cloud? Some options are available only in cloud.
  15. The cost of processing (time vs power usage vs speed)

Due to the high computational requirements of machine learning / deep learning, lot of research is going into development of new hardware's and optimization of software's to exploit the existing hardware's. The research in hardware can be classified as evolutionary or revolutionary. Use of photons instead of electrons in OPU is an example of bridging from evolutionary to revolutionary computing. We are in the early stages of revolutionary computing with options like neuromorphic computing or quantum computing in the future. A hardware revolution is happening which ones might allow us to solve problems which were out off bounds for evolutionary computing. It is not sure how the computing landscape will change in next couple of decades, but we can hope for better. It is estimated that "Data centers of the world will consume 1/5 of Earth’s power by 2025". It is a must that we produce less power hungry machines going forward as the compute requirements are constantly increasing.

To view or add a comment, sign in

More articles by Rajeev M A

  • AI: Between Innovation, Hype, and Economic Reality

    We are living through one of the most exciting — and uncertain — phases in the history of technology. AI has gone from…

    8 Comments
  • Application of AI in Media Sector

    Deep Learning has increasingly found its place in the creative side of media, driving innovation across storytelling…

  • Bridging the Gap: Industry and Academia in AI/ML

    1. Intent Over the past decade, I have been actively engaged with academic institutions in a part-time capacity…

    11 Comments
  • Vibe Coding: Where the AI Magic Fizzles Out

    Introduction: There’s a lot of noise lately about how AI will soon write most of our code, leaving developers to merely…

    6 Comments
  • The 7R Model of AI Evolution: From Retrieval to Retroponitic

    Artificial Intelligence (AI) has been on an extraordinary journey of growth, evolving through distinct stages of…

    7 Comments
  • There is No Innovation Without an Invoice

    Introduction Success of technology depends on the value it adds to the business (consumer and enterprise). Some…

    5 Comments
  • Generative AI

    Generative AI, often referred to as GenAI, is a specialized subset of artificial intelligence dedicated to creating…

  • Applications of Artificial Intelligence in the Power Sector

    I had the privilege of speaking at the National Symposium on Emerging Technologies for Green Energy, an event organized…

    6 Comments
  • Stochasticity in Business Process

    Normally business processes are deterministic in nature. Rule based systems are fundamental part of any business…

    4 Comments
  • MLOps

    Why do many Machine Learning (ML) projects fail? Another way to look at it is, why many software projects fail? Can it…

Others also viewed

Explore content categories