From the course: NVIDIA Certified Associate AI Infrastructure and Operations (NCA-AIIO) Cert Prep
Unlock this course with a free trial
Join today to access over 25,500 courses taught by industry experts.
NVIDIA technology stack - NVIDIA Tutorial
From the course: NVIDIA Certified Associate AI Infrastructure and Operations (NCA-AIIO) Cert Prep
NVIDIA technology stack
There are so many technologies developed by NVIDIA in terms of hardware and software, it is impossible to cover everything in one single training. And it is impossible to put them together into a layered format. What I tried to do here, I have created this six-layer architectural diagram for you to understand how these components fit together. Obviously, this is not an exhaustive list. I tried to focus on the most important concept to get started with and then also which are important for exam. We would first start by understanding physical layer. All the physical component, those CPUs and systems and node and network architectures and network component. Then we'll talk about data movement, how these devices communicate with each other, what technologies NVIDIA has developed for faster and efficient communication, which are NVLink, RDMA, InfiniBand. Then comes operating system for these nodes, that is DGX-OS, GPU drivers are there, virtualization technologies for GPU exists, so we'll…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
(Locked)
NVIDIA: Powering AI GPU innovation2m 37s
-
(Locked)
NVIDIA technology stack3m 12s
-
(Locked)
Layer 1: Physical layer3m 53s
-
(Locked)
GPU on a graphics card1m 57s
-
(Locked)
DGX platform2m 56s
-
(Locked)
DGX SuperPOD1m 57s
-
(Locked)
ConnectX1m 49s
-
(Locked)
BlueField DPUs2m 32s
-
(Locked)
NVIDIA reference architectures1m 38s
-
(Locked)
Understanding GPU cores5m
-
(Locked)
Comparing GPU cores4m 18s
-
(Locked)
NVIDIA DGX platform: Timeline4m 47s
-
(Locked)
DGX platform: Deployment options3m 38s
-
(Locked)
DGX A100 vs. H1004m 6s
-
(Locked)
Layer 2: Data movement and I/O acceleration59s
-
(Locked)
NVLink8m 5s
-
(Locked)
InfiniBand2m 5s
-
(Locked)
InfiniBand vs. Ethernet1m 43s
-
(Locked)
DMA and RDMA6m 30s
-
(Locked)
GPUDirect RDMA2m 44s
-
(Locked)
GPUDirect storage1m 45s
-
(Locked)
Quick comparison1m 56s
-
(Locked)
Layer 3: OS, driver, and virtualization2m 17s
-
(Locked)
GPU drivers4m 38s
-
(Locked)
GPU virtualization5m 8s
-
(Locked)
vGPU vs. MIG, part 17m 48s
-
(Locked)
vGPU vs. MIG, part 210m 59s
-
(Locked)
Layer 4: Core libraries6m 44s
-
(Locked)
Compute unified device architecture (CUDA)3m 12s
-
(Locked)
Installing CUDA2m 11s
-
(Locked)
NVIDIA collective communications library (NCCL)3m 41s
-
(Locked)
NVLink, NVSwitch, PCIe, RDMA vs. NCCL3m 44s
-
(Locked)
Layer 5: Monitoring and management2m 23s
-
(Locked)
NVIDIA-SMI4m 24s
-
(Locked)
Data Center GPU Manager (DCGM)7m 27s
-
(Locked)
Base Command Manager5m 33s
-
(Locked)
Which one to use?2m 3s
-
(Locked)
Layer 6: Applications and vertical solutions3m 48s
-
(Locked)
Summary2m 26s
-
(Locked)
NVIDIA AI Enterprise3m 2s
-
(Locked)
NVIDIA AI Factory2m 24s
-
(Locked)
-