HPDC 2023 Quick Note

Put here my quick note from 22 papers in HPDC 2023. I try to outline the problems targeted by each paper. Dig into papers for their solutions. Overall, AI shapes the HPC computing, not only with its training requirement and inference capability, but also with new hardwares built for them.

1, Floating-Point Exception is not available on GPU. This work proposes GPU-FPX to reduce CPU-GPU transfer overhead and binary instrumentation overhead.

Design and Evaluation of GPU-FPX: A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs. Xinyi Li (the University of Utah) ; Ignacio Laguna (Lawrence Livermore National Laboratory) ; Bo Fang, Katarzyna Swirydowicz, Ang Li (Pacific Northwest National Laboratory) ; Ganesh Gopalakrishnan (the University of Utah)

2, Reserve some space of GPU ‘s onboard memory (HBM) for data cache in the context of check-point.

GPU-Enabled Asynchronous Multi-level Checkpoint Caching and PrefetchingAvinash Maurya (Rochester Institute of Technology) ; Bogdan Nicolae (Argonne National Laboratory) ; M Mustafa Rafique (Rochester Institute of Technology) ; Thierry Tonellot (Exploration and Petroleum Engineering Advanced Research Center, Saudi Aramco) ; Franck Cappello (Argonne National Laboratory) ; Hussain J. AlSalem (Exploration and Petroleum Engineering Advanced Research Center, Saudi Aramco)

3, Compress data before transferring/duplicating the data with multiple-level compression of MGARD.

RAPIDS: Reconciling Availability, Accuracy, and Performance in Managing Geo-Distributed Scientific Data , Lipeng Wan (Georgia State University) ; Jieyang Chen (The University of Alabama at Birmingham) ; Xin Liang (University of Kentucky) ; Ana Gainaru, Qian Gong (Oak Ridge National Laboratory) ; Qing Liu (New Jersey Institute of Technology) ; Ben Whitney (Oak Ridge National Laboratory) ; Joy Arulraj (Georgia Tech) ; Zhengchun Liu (Argonne National Laboratory) ; Ian Foster (Argonne Nat Lab and U.Chicago) ; Scott Klasky (Oak Ridge National Laboratory)

4, Processing SpGEMM column by column when more than 7 NNZ per column

Efficient Execution of SpGEMM on Long Vector Architectures , Valentin Le Fèvre (Barcelona Supercomputing Center) ; Marc Casas (Barcelona Supercomputing Center, Universitat Politècnica de Catalunya)

5, Adaptive approach to improve performance of SpMV on GPU

Efficient Algorithm Design of Optimizing SpMV on GPU, Chu Genshen, He Yuanjie, Ding Zhezhao, Chen Dandan, Bai He (University of Science and Technology Beijing) ; Wanfg XueSong (China Institute of Atomic Energy) ; Hu Changjun (University of Science and Technology Beijing)

6, Dual-optimization, bitshuffle to improve lossy compression on GPU

FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs, Boyuan Zhang, Jiannan Tian (Indiana University) ; Sheng Di, Xiaodong Yu (Argonne National Laboratory) ; Yunhe Feng (University of North Texas) ; Xin Liang (University of Kentucky) ; Dingwen Tao (Indiana University) ; Franck Cappello (Argonne National Laboratory)

7, AWS/GCP/Azure not open-source. Ilúvatar is open-sourced and it also supports Container, Queue Optimization, etc.

Ilúvatar: A Fast Control Plane for Serverless Computing Alexander Fuerst, Abdul Rehman, Prateek Sharma (Indiana University)

8, Python tool to analyze profile data of running code. Profile data may include Call Tree/Performance Data/Metadata per the running of the program.

Thicket: Seeing the Performance Experiment Forest for the Individual Run Trees. Stephanie Brink (Lawrence Livermore National Laboratory) ; Michael McKinsey (Texas A&M University) ; David Boehme (Lawrence Livermore National Laboratory) ; W. Daryl Hawkins (Texas A&M University) ; Connor Scully-Allison (University of Utah) ; Ian Lumsden, Treece Burgess, Vanessa Lama (University of Tennessee, Knoxville) ; Katherine E. Isaacs (University of Utah) ; Jakob Lüttgau, Michela Taufer (University of Tennessee, Knoxville) ; Olga Pearce (Lawrence Livermore National Laboratory)

9, Parallel decompression for Gzip files, which can be remote hosted

Rapidgzip: Parallel Decompression and Seeking in Gzip Files Using Cache Prefetching Maximilian Knespel, Holger Brunst (Technische Universität Dresden)

10, LevelDB with B+ tree to organize data on disk, supporting bundle compaction

Closing the Performance Gap between Leveling and Tiering Compaction via Bundle Compaction Ruicheng Liu, Peiquan Jin, Xiaoliang Wang, Yongping Luo, Zhaole Chu, Yigui Yuan (University of Science and Technology of China)

11, Quad-color algorithm for garbage collection in Go language.

Let It Go: Relieving Garbage Collection Pain for Latency Critical Applications in Golang Junxian Zhao (University of Colorado Colorado Springs) ; Xiaobo Zhou (University of Macau and University of Colorado Colorado Springs) ; Sang-Yoon Chang (University of Colorado Colorado Springs) ; ChengZhong Xu (University of Macau)

12, A safe and timely resource harvesting framework for multi-node serverless clusters

Libra: Harvesting Idle Resources Safely and Timely in Serverless Clusters Hanfei Yu, Christian Fontenot, Hao Wang (Louisiana State University) ; Jian Li (SUNY-Binghamton University) ; Xu Yuan (University of Louisiana at Lafayette) ; Seung-Jong Park (Louisiana State University)

13, How to design a fast storage system with an ordering layer for serverless computing.

FlexLog: A Shared Log for Stateful Serverless Computing. Dimitra Giantsidi (The University of Edinburgh) ; Emmanouil Giortamis, Nathaniel Tornow (Technical University of Munich) ; Florin Dinu (Huawei Research) ; Pramod Bhatotia (Technical University of Munich)

14, how to group lambda functions to reduce execution time in serverless function

ProPack: Executing Concurrent Serverless Functions Faster and Cheaper Rohan Basu Roy (Northeastern University) ; Tirthak Patel (Rice University) ; Richmond Liew, Devesh Tiwari (Northeastern University) ; Yadu Nand Babuji, Ryan Chard (Argonne National Laboratory)

15, How to distribute all the arrived queries of an ML inference service to different instances in a heterogeneous cloud?

Kairos: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources Baolin Li (Northeastern University) ; Siddharth Samsi (MIT Lincoln Laboratory) ; Vijay Gadepally (MIT) ; Devesh Tiwari (Northeastern University)

16, How to remove duplicated data in constructing graphs for GNN training

Redundancy-Free High-Performance Dynamic GNN Training with Hierarchical Pipeline Parallelism Yaqi Xia, Zheng Zhang, Hulin Wang (Wuhan University) ; Donglin Yang (Nvidia Corporation) ; Xiaobo Zhou (University of Macau) ; Dazhao Cheng (Wuhan University)

17, How to use AI models (as surrogate models) to replace expensive computing in simulation code

Auto-HPCnet: An Automatic Framework to Build Neural Network-based Surrogate for High-Performance Computing Applications Wenqian Dong (Florida International University) ; Gokcen Kestor (Pacific Northwest National Laboratory) ; Dong Li (University of California, Merced)

18, How to use AI to find the best configurations (e.g., number of threads, scheduling policy and chunk size, runtime devices) for a HPC program

Performance Optimization using Multimodal Modeling and Heterogeneous GNN. Akash Dutta (Iowa State University) ; Jordi Alcaraz Rodriguez (University of Oregon) ; Ali TehraniJamsaz (Iowa State University) ; Anna Sikora, Eduardo Cesar Galobardes (Universitat Autònoma de Barcelona) ; Ali Jannesari (Iowa State University)

19, How to schedule applications on emerging hardware with thousands of CPU and fast network on chip (proposed canonical task graphs)

Streaming Task Graph Scheduling for Dataflow Architectures Tiziano De Matteis, Lukas Gianinazzi, Johannes de Fine Licht, Torsten Hoefler (ETH Zurich)

20, How to handle dynamic graphs for on-line page rank algorithm

Real-Time PageRank on Dynamic Graphs. Scott Sallinen, Juntong Luo, Matei Ripeanu (The University of British Columbia)

21, How to understand the pricing strategy of cloud provider Alibaba

Deconstructing Alibaba Cloud’s Preemptible Instance Pricing, Danielle Movsowitz Davidow (Tel-Aviv University) ; Orna Agmon Ben-Yehuda (CRI, University of Haifa) ; Orr Dunkelman (University of Haifa)

22, How to use regression models + SHAP to find I/O performance bottlenecks on profiling data

AIIO: Using Artificial Intelligence for Job-Level and Automatic I/O Performance Bottleneck Diagnosis, Bin Dong, Jean Luca Bez (Lawrence Berkeley National Laboratory) ; Suren Byna (The Ohio State University)

HPDC 2023 Quick Note

Bin Dong

Recommended by LinkedIn

More articles by Bin Dong

Others also viewed

Running LLMs Across CPU, GPU, and NPU

When and How Will AMD Catch Up to NVIDIA?

Large Language Models (LLMs) + Stable Diffusion on Cisco UCS X-Series

New Class of GPU for 1M+ Token Workloads, Setting Inference Records, More Ways to Get CUDA

Beyond Parallel Processing: Why D10Z/MIZAN Represents a Category Shift From GPU Architecture

Breaking the GPU Bottleneck: Researcher Challenges

Rubin CPX: NVIDIA Dedicated Inference GPU, Redefining AI Acceleration

CUDA’s Largest Update Yet, New Open Models, and More

An Experiment: Comparing CPU and GPU Performance

Reference Architecture for Large Language Model (LLM) Deployment Using vLLM

Explore content categories