CppCon, The C++ Conference’s Post

View organization page for CppCon, The C++ Conference

23,678 followers

9mo

CppCon, The C++ Conference 2025 September 14-19th, Aurora, Colorado Preview: Hui Xie: "Implement Standard Library: Design Decisions, Optimisations and Testing in Implementing Libc++" https://sched.co/27bNw This presentation covers various practical examples in the designs, optimisations and testing in libc++, a standard library implementation. In space optimisation section, it presents various examples of using compact type, reusing tail padding bytes, reusing unused bits in existing bytes, in various standard types: std::stop_token , std::expected , std::optional , std::variant , std::ranges library and std::move_only_function . In time optimisations section, it presents examples of how we optimise std::atomic<T>::wait 's waiting strategy, how we optimised algorithms for segmented iterators, and also how we keep in mind optimisations by leaving the door open for future optimisations. At the same time, compilation time is also important so it also contains examples how unnecessary template instantiations can be avoided. Finally, this talk covers the unit tests of libc++, including the high test coverage of standard spec, the technic we share tests between runtime and constexpr, negative testing and so on.

CppCon 2025: Implement Standard Library: Design Decis... cppcon2025.sched.com

To view or add a comment, sign in

More Relevant Posts

Herik Lima
7mo
Report this post
Memory Order & Atomics – The Hidden Complexity! Why? Last week, we ran a poll, and the winning topic was about memory ordering and atomics in C++. I was really happy with this choice! This topic is one of those that looks simple on the surface, but has deep implications for performance, correctness, and multi-threaded design — something we care a lot about here. So, what is really the difference between memory orders in atomics? C++ provides atomic types to safely share data between threads. The truth is that atomics are almost trivial to use for basic operations, but subtle differences in memory order can change everything: memory_order_relaxed → Operations are atomic, but no ordering guarantees. Best for counters or statistics where exact ordering doesn’t matter. memory_order_acquire / release → Ensures proper synchronization between threads. Acquire ensures that subsequent reads see the latest writes; release ensures that previous writes are visible before the atomic operation. memory_order_seq_cst → The strictest ordering; all threads see operations in the same order. Safe, but can have a performance cost. This might seem like a small thing, but choosing the right memory order communicates your intent and prevents subtle bugs that are nearly impossible to debug. Using atomics correctly allows you to avoid locks while keeping your code correct and performant. The results were clear: Relaxed → Perfect for lightweight counters or stats where ordering isn’t critical. Acquire/Release → Ideal for producer-consumer patterns or synchronizing shared state. Seq_Cst → Best for situations where absolute ordering matters, but at some performance cost. This is one of those cases where the right choice improves maintainability, correctness, and efficiency of your concurrent code. What do you think about this feature? C++ MasterClass #ModernCpp #CppMasterClass #CppTips #CppCommunity #CleanCode #CppDesign #ProgrammingLanguages #SystemsProgramming #TechExplained #ObjectOriented #CppBestPractices #WriteBetterCode
4 Comments
Like Comment
To view or add a comment, sign in
C++ MasterClass

2,763 followers
7mo
Report this post
Memory Order & Atomics – The Hidden Complexity! Why? Last week, we ran a poll, and the winning topic was about memory ordering and atomics in C++. I was really happy with this choice! This topic is one of those that looks simple on the surface, but has deep implications for performance, correctness, and multi-threaded design — something we care a lot about here. So, what is really the difference between memory orders in atomics? C++ provides atomic types to safely share data between threads. The truth is that atomics are almost trivial to use for basic operations, but subtle differences in memory order can change everything: memory_order_relaxed → Operations are atomic, but no ordering guarantees. Best for counters or statistics where exact ordering doesn’t matter. memory_order_acquire / release → Ensures proper synchronization between threads. Acquire ensures that subsequent reads see the latest writes; release ensures that previous writes are visible before the atomic operation. memory_order_seq_cst → The strictest ordering; all threads see operations in the same order. Safe, but can have a performance cost. This might seem like a small thing, but choosing the right memory order communicates your intent and prevents subtle bugs that are nearly impossible to debug. Using atomics correctly allows you to avoid locks while keeping your code correct and performant. The results were clear: Relaxed → Perfect for lightweight counters or stats where ordering isn’t critical. Acquire/Release → Ideal for producer-consumer patterns or synchronizing shared state. Seq_Cst → Best for situations where absolute ordering matters, but at some performance cost. This is one of those cases where the right choice improves maintainability, correctness, and efficiency of your concurrent code. What do you think about this feature? C++ MasterClass #ModernCpp #CppMasterClass #CppTips #CppCommunity #CleanCode #CppDesign #ProgrammingLanguages #SystemsProgramming #TechExplained #ObjectOriented #CppBestPractices #WriteBetterCode
Like Comment
To view or add a comment, sign in
AIOneGuard

43 followers
7mo
Report this post
𝗘𝗡𝗚𝗜𝗡𝗘𝗘𝗥𝗜𝗡𝗚 𝗨𝗣𝗗𝗔𝗧𝗘: 𝟮𝟬𝟮𝟱𝗖𝗪𝟑𝟗 The past week's activity was around two topics. One is the "codebake" documentation and manual review of the process on one library. Thanks to the process the transformers library size decreased from 450K LoC to around 1400. There is still an optimization possibility to merge class definitions that are used only once, and remove unused method definitions. I assume the final size would be around 1000 lines. That illustrates how much waste is involved in a simple inference. The other part is the ObjectStore implementation, where the final design is complete. We now migrate the code to Rust, as this is also a critical part, involving all the encryption and key generation at the object level. With all the security reviews on the new components will take some time, but now it is more development and testing effort compared to the design. This week, the aim is to continue the process by integrating a system-defined object that we can use to test the functionality, thereby incorporating the new code into the existing AIOneGuard codebase.
Like Comment
To view or add a comment, sign in
Maximilian Feldthusen
7mo
Report this post
Memory-Pool-in-C The code implements a simple memory pool in C, which is a technique for managing memory allocations and deallocations efficiently. This is particularly useful in applications with frequent allocations and deallocations of memory blocks of the same size. Let’s break down the code step by step.
Like Comment
To view or add a comment, sign in
Ahabb Sheraz
6mo Edited
Report this post
Memory pooling is a common technique to manage memory when running Large Language Models (LLMs). Most SOTA LLMs have parameter counts that exceed the usable VRAM in most accelerators. For example, DeepSeek V3 has a total of 671 billion total parameters. When considering disk space, the full 671B parameter model requires approximately 715GB. It is impractical (and impossible at times) to load all of it on memory. The latest and greatest Nvidia H200s have a 141GB of HBM3e VRAM. The solution is to allocate fixed sized blocks and store parts of the model weights (remember, not all the model weights). Then, overwrite the blocks with new weights after compute on the previously loaded weights is done. Managing a memory pool that allocates and updates memory blocks with new weights. Luckily, most LLMs have their weights split to multiple xbin files on HuggingFace using the Xet protocol. Memory blocks can then load multiple sets of these shard files on-demand and do not run out of memory.
Maximilian Feldthusen

Freelance Systems Architect | Optimizing C/C++ & Assembly Codebases for Speed & Security | Founder of Mafeforge | Delivering Robust Solutions for Performance-Critical Environments
7mo

Memory-Pool-in-C The code implements a simple memory pool in C, which is a technique for managing memory allocations and deallocations efficiently. This is particularly useful in applications with frequent allocations and deallocations of memory blocks of the same size. Let’s break down the code step by step.
2 Comments
Like Comment
To view or add a comment, sign in
mohamed shoaib
7mo
Report this post
🎯 Day 17 – Placement Prep Focus Areas 🧠 DSA (Leetcode – Arrays & Strings Refresh): Leetcode 56 – Merge Intervals Leetcode 75 – Sort Colors Leetcode 238 – Product of Array Except Self Leetcode 53 – Maximum Subarray (revisit Kadane’s Algorithm for pattern recall) 💾 Core Subject – Operating Systems (Revision): Process Scheduling (FCFS, SJF, Round Robin, Priority) Deadlocks (Conditions, Prevention, Banker's Algorithm) Paging vs Segmentation CPU vs I/O Bound Processes 🧩 Aptitude: Time, Speed, and Distance (2 sets of 10 problems). 🗣️ HR / Soft Skills: Draft a 1-minute self-introduction (update it based on your latest learnings). 💡 Goal: Sharpen consistency and conceptual strength — one small step closer to becoming placement-ready!
Like Comment
To view or add a comment, sign in
Marcel Fiore
6mo
Report this post
The most difficult thing for me settling in to modern C++ from a background in C++ 98 is getting familiar with STL algorithms. Here’s a great talk from Jonathan Boccara where he organizes them into a fantasy-style map to make them easier to remember. I have personally been debating whether using these algorithms actually does improve your code’s readability and maintainability (and I hope to make a post about that soon). Regardless, as long as I continue using C++, it’s in my best interest to understand them. https://lnkd.in/ghXgzXsG

CppCon 2018: Jonathan Boccara “105 STL Algorithms in Less Than an Hour”

https://www.youtube.com/

2 Comments
Like Comment
To view or add a comment, sign in
Bharath S
7mo
Report this post
SHA-256 Implementation in Bare-Metal C with Raylib GUI I Built a complete SHA-256 cryptographic hash function from scratch in pure C without relying on any external Libraries. The implementation follows the FIPS-180-4 standard and is written in bare-metal style - no standard library dependencies for the core hashing logic, making it suitable for embedded systems and resource-constrained environments. The code handles all the bit manipulation, rotations, and compression functions manually using only basic integer types. To demonstrate and test the implementation I created an Simple GUI using Raylib that allows users to input text and instantly see the computed SHA-256 hash and also verifies it with OpenSSL's output to ensure correctness. Test The code : https://lnkd.in/gu2Bj45J
1 Comment
Like Comment
To view or add a comment, sign in

23,678 followers

View Profile Connect

CppCon, The C++ Conference’s Post

More from this author

CppCon 2022 Call for Submissions

Thanks to our sponsors for helping make CppCon 2021 possible

CppCon 2021 Online Keynote: Six Impossible Things by Kevlin Henney

Explore content categories

CppCon, The C++ Conference’s Post

More Relevant Posts

CppCon 2018: Jonathan Boccara “105 STL Algorithms in Less Than an Hour”

https://www.youtube.com/

More from this author

CppCon 2022 Call for Submissions

Thanks to our sponsors for helping make CppCon 2021 possible

CppCon 2021 Online Keynote: Six Impossible Things by Kevlin Henney

Explore content categories