CC-FR’s Post

1,172 followers

In C++, GPU computing evoled drastically in 2020 when NVidia announced its compiler, nvc++, could compile C++ 17 computing directly on GPU without the need for the developer to use extra libraries such as SYCL, Kokkos, etc. Now, nvc++ can also deal...

Modern C++ GPU Computing with STD:: Algotithm and Cuda

https://www.youtube.com/

To view or add a comment, sign in

More Relevant Posts

HPC Serbia

249 followers
4w
Report this post
GRAY SCOTT THURSDAYS: Modern C++ GPU computing with std::algorithm and CUDA This week’s Gray Scott Thursdays webinar will explore Modern C++ GPU computing with std::algorithm and CUDA. Scheduled for 9 April 2026, the session will provide a practical introduction to writing GPU-accelerated code using modern C++ standards. GPU programming in C++ has undergone a major transformation in recent years. Since NVIDIA introduced support in its nvc++ compiler for offloading standard C++ (starting with C++17) directly to GPUs, developers can now leverage familiar language features without relying on additional frameworks such as SYCL or Kokkos. Support has since expanded to C++20, while alternative ecosystems—such as Intel’s DPC++ have further enriched the landscape. This webinar will demonstrate how to perform GPU computations using modern C++ (C++17 and C++20), with an emphasis on the use of std::algorithm. It will also compare this approach to more traditional GPU programming models like CUDA, highlighting key similarities, differences, and portability considerations—particularly in relation to CPU-based implementations discussed in previous sessions. More information: https://lnkd.in/dmC9fNtq #eurocc4see #EuroHPC #EuroCC
Like Comment
To view or add a comment, sign in
Naman Jain
2w
Report this post
Numbers don't lie. ReactiveChainDB V2 just bypassed the competition. 🚀 By moving our entire consensus and backpressure execution to the GPU and implementing a strict zero-CPU-fallback policy, we've created a literal "CPU Bypass." When you stop context-switching on the CPU and let the GPU batch-process the workloads natively, the performance gains are staggering. p50: 2ms (ReactiveChainDB v2) vs. 120ms (ScyllaDB) p99: 1,265ms (ReactiveChainDB v2) vs. 28,873ms (ScyllaDB) Even at the 99th percentile, the system remains incredibly stable at just ~1.2 seconds, completely outclassing the competition in our test environment. Take a look at the log-scale chart below. Soon Full Article Will be Published #Coding #Benchmarks #DataEngineering #GPU #ReactiveChainDB #BuildInPublic #OpenSource #SoftwareArchitecture #Java21 #TornadoVM #GPU #Innovation #TechMilestone #DatabaseOptimization #Engineering #ScyllaDB #RocksDB #Backend #DatabaseEngineering #SystemDesign #HighPerformanceComputing #GPUComputing #CUDA #namanoncode
Like Comment
To view or add a comment, sign in
Dr Neelanjan Manna
3w
Report this post
Imagine a system that cracks n component of RSA in 0(1) on a classical system powered by Ryzen 9 16 gb ram ubuntu no gpu the bits honestly no longer matter once u feel the flow 2048 4096 8192 a chain is no longer than its weakest link , not quantum but nano edge classical. Train the initial bits for 6 to 9 months for maybe 4096 bits and then onwards in a week . Classical computing is all about variable space per bit. #learnthebasics.
Like Comment
To view or add a comment, sign in
HPC Serbia

249 followers
1w
Report this post
GRAY SCOTT THURSDAYS: EVE – a C++20 Computing Library on CPU Gray Scott’s Thursday webinars continue, with this week's session focusing on the EVE C++ library (Expressive Vector Engine), a modern C++20 library that simplifies and enhances vectorized programming. Join this Thursday, 23 April 2026. Modern high-performance computing relies heavily on SIMD (Single Instruction, Multiple Data) instruction sets, which have been a cornerstone of processor architectures since the late 1990s. While SIMD enables significant performance gains through parallel data processing, achieving optimal results in practice is often challenging. Manual vectorization can be complex and error-prone, while compiler autovectorization does not always deliver the desired efficiency. EVE combines the expressiveness of modern C++ idioms with high-performance SIMD capabilities, offering developers a more intuitive and portable way to write vectorized code. With support for major architectures, including Intel, ARM, and PowerPC, it provides a unified approach to performance optimization across platforms. More information: https://lnkd.in/d24FT3XR #EuroHPC #EuroCC #EuroCC4SEE #supercomputing
Like Comment
To view or add a comment, sign in
Janardhan Srinivasan
4d
Report this post
Qwen3-0.6B is awesome dense Model quantized to Q4_K_M in a container and this beast spits out at 360 tokens / sec. Considering it is only 400MB and consumes just 1GB RAM on normal CPU, it is extremely responsive for many simple prompts. 400MB file holding so much of data, spitting out answers in several languages in ms is awesome success of Math ( Dot Product, Cosine Similarity, Euclidean distance )
Like Comment
To view or add a comment, sign in
Institute of Computing for Climate Science (ICCS)

723 followers
1w
Report this post
📢 Dr Tom Meltzer, one of our Principal Research Software Engineers, is a man on a mission to spread the word about debuggers. Last week Tom was invited to give a talk at AMD about mdb, his debugger: "I explained how mdb can be used to debug HPC applications that rely on MPI. Then I provided a live demonstration of core features of my debugger, with a special focus on debugging MPI GPU code on AMD's MI300 system. I showed them how to debug kernels and manipulate memory on device without manually copying it back to the host CPU. "The AMD talk was attended by more than 30 of their Engineers. AMD are also planning to include mdb in their core software stack on the AMD MI300 nodes. This would make mdb available to all of their users." 👏 Tom received some fabulous feedback from the organisers: “Thank you for an excellent talk and demo. You are solving a problem that is relevant for people like us who work on different codes, that are totally new to us, every few months. We will start using the tool and submit any feedback via GitHub issues into your repo.” Next week AMD have one of their largest training events of the year, happening next week at HLRS - High-Performance Computing Center Stuttgart (with more than 200 people signed up), and Tom’s creation, mdb, will be included in the session on debugging. 👉 Find out about other events where you can meet ICCS: https://lnkd.in/eBkGGw2c
Like Comment
To view or add a comment, sign in
Yamil Garcia
3d
Report this post
"Mastering Compiler Attributes in Embedded C" This presentation explores how compiler attributes enable the precise control required for efficient firmware development. It walks through key concepts from the sources, such as: • Memory Management: Techniques for using sections to place data in specific Flash or EEPROM regions and aligned to ensure data integrity. • Performance and Optimization: How to use always_inline for timing-critical GPIO paths and optimize levels to balance speed and code size. • Firmware Architecture: Utilizing weak linking for HAL customization and packed structures for precise protocol parsing. • Advanced Control: Practical patterns for combining attributes to manage complex tasks like firmware patching #learningbytutorials #learning #embeddedprogramming #embeddedsystems #cprogramming
Like Comment
To view or add a comment, sign in
Kyle Yu
4w Edited
Report this post
Looking for competitive and performance-based GPU Programming? Tensara is a GPU kernel practice platform where you can squeeze GFLOPs and see where you rank amongst top kernel engineers. There are currently 80+ problems to solve in CUDA, Triton, Mojo, CuTe DSL, and cuTile across 8 different GPU architectures hosted by Modal. I've learned a lot from experimenting with different optimization techniques and reverse engineering top submissions. Passing correctness tests is the baseline. Topping runtime leaderboards is the goal! Optimize, benchmark, repeat 👉 https://tensara.org/
6 Comments
Like Comment
To view or add a comment, sign in

1,172 followers

View Profile Connect

CC-FR’s Post

Modern C++ GPU Computing with STD:: Algotithm and Cuda

https://www.youtube.com/

More Relevant Posts

Explore related topics

Explore content categories