nVidia is doubling down on AI workloads with the Turing GPU architecture
Yesterday at SIGGRAPH 2018, nVidia announced their next generation GPU architecture called Turing. Here is my preliminary analysis of the architecture.
Besides the usual improvement in graphics performance and capabilities, the thing to marvel at, if you are working with AI, is that they are dedicating a large portion of the die for machine learning workloads with their Tensor cores.
What is amazing is that a generic GPU architecture can almost catch up with the latest in dedicated TPUs or Tensor Processing Units like the Google TPUv2 with 125 TFLOPS vs Google’s 180 TFLOPS at FP16 precision.
Furthermore, they have added an INT4 format for very-low precision inference workloads but with half a Peta OPS of performance. An increasing amount of machine learning models will cope just fine with very-low precision for ‘good enough’ accuracy and will on the other hand gain tremendously in performance.
A top end Turing GPU with 48 GB of VRAM will likely set you back $10K but the equivalent CPU-based horsepower will cost you one or two orders of magnitude more.
It will not take long before the likes of SuperMicro with announce servers with 1-2 CPU sockets and 8 Turing GPUs for unheard of performance for the buck.
Imagining real-time analytics on massive amounts of data or very large-scale deep learning models being processed on a cluster of these compute nodes is mind blowing and ushers in a new dawn for artificial intelligence.
John Fabienke is the founder of arqitekta, an enterprise architecture consulting company specializing in infrastructure strategy and design, big data and AI.