CUDA utilisation: Offload function execution on GPU accelerators

nikos fanourakis

Published Mar 13, 2025

Specialized hardware utilization for function acceleration

The EDGELESS project, a significant endeavor in advancing serverless architectures, aims to provide a serverless platform capable of executing lambda functions at the edge. This innovative approach allows the serverless platform to fully exploit the resources available at the nodes where application data is generated and consumed. However, this objective presents new challenges, such as the need, in some cases, for edge devices to handle computationally intensive tasks independently. These tasks could involve locally preprocessing video frames to remove sensitive data, classifying objects from camera inputs, or executing other functions related to Machine Learning (ML), Artificial Intelligence (AI), or Computer Vision (CV) [1].

To efficiently perform these calculations, specialised hardware shipped in edge devices could be leveraged. More precisely, the EDGELESS platform will explore the potential of offloading computations to the edge nodes’ Graphical Processing Units (GPUs).

To this end, the Technical University of Crete (TUC), an academic partner of the project, is actively studying techniques to virtualise lambda executors running on GPUs through lightweight abstractions.

Experimentation devices

NVIDIA is one of the leading players in the technology industry, manufacturing GPUs. These GPUs are at the forefront of performance and efficiency, making them essential components frequently deployed in high-performance computing applications. For this reason, TUC’s starting point for its research activities in GPU accelerators is the NVIDIA platform.

One of NVIDIA’s standout products is Jetson devices. These all-in-one machines bring GPU capabilities to the edge, delivering enhanced computational power in a small form factor. In the EDGELESS platform, devices like Jetson Orin or Jetson Xavier will play a key role in accelerating functions on GPUs. These devices are also used as experimentation devices during TUC activities.

To fully unleash the power of NVIDIA GPUs, NVIDIA has developed CUDA, a versatile parallel computing platform and programming model. CUDA is widely adopted by popular AI/ML frameworks and libraries like Pytorch, which utilise it to realise GPU acceleration.

GPU offloading in the EDGELESS SYSTEM

Actions towards utilising CUDA from EDGELESS node devices are one of the studies that TUC performs to make GPU offloading of functions, in a simple manner, available in the EDGELESS system.

Recommended by LinkedIn

Unlocking GPU's Parallel Power: An Introduction to…

Dhananjay Kumar 1 year ago

CUDA’s Largest Update Yet, New Open Models, and More

NVIDIA AI 4 months ago

Nvidia GPU - Memory Architecture

Abhisar Mohapatra 10 months ago

Experiments have been conducted using virtualised environments, in which AI/ML frameworks/libraries are installed, and functions implemented within these environments. The objective is to leverage these frameworks/libraries, which internally use CUDA as a backend to activate the GPU computations. Some first results of our study are available on Fig 2.

The goal of our research activities is to empower developers on the EDGELESS platform, enabling them to implement and execute their own functions on GPUs. This opens up a world of possibilities for GPU offloading of functions that require that kind of processing and makes it easier to run AI/ML models on these edge devices in an isolated manner.

For more direct updates follow us on: LinkedIn Mastodon and X.

For regular updates including a collection of news and relevant information sign up on our Newsletter here.

References:

[1] G. Vasiliadis, L. Koromilas, M. Polychronakis, and S. Ioannidis, ‘GASPP: A GPU-Accelerated Stateful Packet Processing Framework’, in 2014 USENIX Annual Technical Conference (USENIX ATC 14), 2014, pp. 321–332.

Blog signed by: TUC Team

To view or add a comment, sign in

CUDA utilisation: Offload function execution on GPU accelerators

nikos fanourakis

Specialized hardware utilization for function acceleration

Experimentation devices

GPU offloading in the EDGELESS SYSTEM

Recommended by LinkedIn

Others also viewed

How NVIDIA Dynamo and VAST Unlock Context Reuse at Scale

32. NVIDIA's Strategy in 1 Page: CUDA

NVIDIA GTC 2025: AI Reasoning, Blackwell Ultra, Vera Rubin, CPO, Dynamo Inference

Rubin CPX: NVIDIA Dedicated Inference GPU, Redefining AI Acceleration

Nvidia's powerful strategy: Full AI Orchestration

How Does The H100 Perform in Multi-GPU Cluster Setups?

Accelerating Generative AI: NVIDIA's CUDA Reinvents HPC

Aptus - Edge AI Environment

NVIDIA DGX Spark vs Apple Mac Studio M3 Ultra

NewMind AI Journal #53

Explore content categories

Specialized hardware utilization for function acceleration

Experimentation devices

GPU offloading in the EDGELESS SYSTEM

Recommended by LinkedIn

Others also viewed

How NVIDIA Dynamo and VAST Unlock Context Reuse at Scale

32. NVIDIA's Strategy in 1 Page: CUDA

NVIDIA GTC 2025: AI Reasoning, Blackwell Ultra, Vera Rubin, CPO, Dynamo Inference

Rubin CPX: NVIDIA Dedicated Inference GPU, Redefining AI Acceleration

Nvidia's powerful strategy: Full AI Orchestration

How Does The H100 Perform in Multi-GPU Cluster Setups?

Accelerating Generative AI: NVIDIA's CUDA Reinvents HPC

Aptus - Edge AI Environment

NVIDIA DGX Spark vs Apple Mac Studio M3 Ultra

NewMind AI Journal #53

Similar topics

Edge Analytics Development

Applying GenAI and ML in AWS Projects

Explore content categories