CppCon, The C++ Conference 2025 September 14-19th, Aurora, Colorado Preview: Hui Xie: "Implement Standard Library: Design Decisions, Optimisations and Testing in Implementing Libc++" https://sched.co/27bNw This presentation covers various practical examples in the designs, optimisations and testing in libc++, a standard library implementation. In space optimisation section, it presents various examples of using compact type, reusing tail padding bytes, reusing unused bits in existing bytes, in various standard types: std::stop_token , std::expected , std::optional , std::variant , std::ranges library and std::move_only_function . In time optimisations section, it presents examples of how we optimise std::atomic<T>::wait 's waiting strategy, how we optimised algorithms for segmented iterators, and also how we keep in mind optimisations by leaving the door open for future optimisations. At the same time, compilation time is also important so it also contains examples how unnecessary template instantiations can be avoided. Finally, this talk covers the unit tests of libc++, including the high test coverage of standard spec, the technic we share tests between runtime and constexpr, negative testing and so on.
CppCon, The C++ Conference’s Post
More Relevant Posts
-
Memory Order & Atomics – The Hidden Complexity! Why? Last week, we ran a poll, and the winning topic was about memory ordering and atomics in C++. I was really happy with this choice! This topic is one of those that looks simple on the surface, but has deep implications for performance, correctness, and multi-threaded design — something we care a lot about here. So, what is really the difference between memory orders in atomics? C++ provides atomic types to safely share data between threads. The truth is that atomics are almost trivial to use for basic operations, but subtle differences in memory order can change everything: memory_order_relaxed → Operations are atomic, but no ordering guarantees. Best for counters or statistics where exact ordering doesn’t matter. memory_order_acquire / release → Ensures proper synchronization between threads. Acquire ensures that subsequent reads see the latest writes; release ensures that previous writes are visible before the atomic operation. memory_order_seq_cst → The strictest ordering; all threads see operations in the same order. Safe, but can have a performance cost. This might seem like a small thing, but choosing the right memory order communicates your intent and prevents subtle bugs that are nearly impossible to debug. Using atomics correctly allows you to avoid locks while keeping your code correct and performant. The results were clear: Relaxed → Perfect for lightweight counters or stats where ordering isn’t critical. Acquire/Release → Ideal for producer-consumer patterns or synchronizing shared state. Seq_Cst → Best for situations where absolute ordering matters, but at some performance cost. This is one of those cases where the right choice improves maintainability, correctness, and efficiency of your concurrent code. What do you think about this feature? C++ MasterClass #ModernCpp #CppMasterClass #CppTips #CppCommunity #CleanCode #CppDesign #ProgrammingLanguages #SystemsProgramming #TechExplained #ObjectOriented #CppBestPractices #WriteBetterCode
To view or add a comment, sign in
-
-
Memory Order & Atomics – The Hidden Complexity! Why? Last week, we ran a poll, and the winning topic was about memory ordering and atomics in C++. I was really happy with this choice! This topic is one of those that looks simple on the surface, but has deep implications for performance, correctness, and multi-threaded design — something we care a lot about here. So, what is really the difference between memory orders in atomics? C++ provides atomic types to safely share data between threads. The truth is that atomics are almost trivial to use for basic operations, but subtle differences in memory order can change everything: memory_order_relaxed → Operations are atomic, but no ordering guarantees. Best for counters or statistics where exact ordering doesn’t matter. memory_order_acquire / release → Ensures proper synchronization between threads. Acquire ensures that subsequent reads see the latest writes; release ensures that previous writes are visible before the atomic operation. memory_order_seq_cst → The strictest ordering; all threads see operations in the same order. Safe, but can have a performance cost. This might seem like a small thing, but choosing the right memory order communicates your intent and prevents subtle bugs that are nearly impossible to debug. Using atomics correctly allows you to avoid locks while keeping your code correct and performant. The results were clear: Relaxed → Perfect for lightweight counters or stats where ordering isn’t critical. Acquire/Release → Ideal for producer-consumer patterns or synchronizing shared state. Seq_Cst → Best for situations where absolute ordering matters, but at some performance cost. This is one of those cases where the right choice improves maintainability, correctness, and efficiency of your concurrent code. What do you think about this feature? C++ MasterClass #ModernCpp #CppMasterClass #CppTips #CppCommunity #CleanCode #CppDesign #ProgrammingLanguages #SystemsProgramming #TechExplained #ObjectOriented #CppBestPractices #WriteBetterCode
To view or add a comment, sign in
-
-
𝗘𝗡𝗚𝗜𝗡𝗘𝗘𝗥𝗜𝗡𝗚 𝗨𝗣𝗗𝗔𝗧𝗘: 𝟮𝟬𝟮𝟱𝗖𝗪𝟑𝟗 The past week's activity was around two topics. One is the "codebake" documentation and manual review of the process on one library. Thanks to the process the transformers library size decreased from 450K LoC to around 1400. There is still an optimization possibility to merge class definitions that are used only once, and remove unused method definitions. I assume the final size would be around 1000 lines. That illustrates how much waste is involved in a simple inference. The other part is the ObjectStore implementation, where the final design is complete. We now migrate the code to Rust, as this is also a critical part, involving all the encryption and key generation at the object level. With all the security reviews on the new components will take some time, but now it is more development and testing effort compared to the design. This week, the aim is to continue the process by integrating a system-defined object that we can use to test the functionality, thereby incorporating the new code into the existing AIOneGuard codebase.
To view or add a comment, sign in
-
Memory-Pool-in-C The code implements a simple memory pool in C, which is a technique for managing memory allocations and deallocations efficiently. This is particularly useful in applications with frequent allocations and deallocations of memory blocks of the same size. Let’s break down the code step by step.
To view or add a comment, sign in
-
-
Memory pooling is a common technique to manage memory when running Large Language Models (LLMs). Most SOTA LLMs have parameter counts that exceed the usable VRAM in most accelerators. For example, DeepSeek V3 has a total of 671 billion total parameters. When considering disk space, the full 671B parameter model requires approximately 715GB. It is impractical (and impossible at times) to load all of it on memory. The latest and greatest Nvidia H200s have a 141GB of HBM3e VRAM. The solution is to allocate fixed sized blocks and store parts of the model weights (remember, not all the model weights). Then, overwrite the blocks with new weights after compute on the previously loaded weights is done. Managing a memory pool that allocates and updates memory blocks with new weights. Luckily, most LLMs have their weights split to multiple xbin files on HuggingFace using the Xet protocol. Memory blocks can then load multiple sets of these shard files on-demand and do not run out of memory.
Freelance Systems Architect | Optimizing C/C++ & Assembly Codebases for Speed & Security | Founder of Mafeforge | Delivering Robust Solutions for Performance-Critical Environments
Memory-Pool-in-C The code implements a simple memory pool in C, which is a technique for managing memory allocations and deallocations efficiently. This is particularly useful in applications with frequent allocations and deallocations of memory blocks of the same size. Let’s break down the code step by step.
To view or add a comment, sign in
-
-
🎯 Day 17 – Placement Prep Focus Areas 🧠 DSA (Leetcode – Arrays & Strings Refresh): Leetcode 56 – Merge Intervals Leetcode 75 – Sort Colors Leetcode 238 – Product of Array Except Self Leetcode 53 – Maximum Subarray (revisit Kadane’s Algorithm for pattern recall) 💾 Core Subject – Operating Systems (Revision): Process Scheduling (FCFS, SJF, Round Robin, Priority) Deadlocks (Conditions, Prevention, Banker's Algorithm) Paging vs Segmentation CPU vs I/O Bound Processes 🧩 Aptitude: Time, Speed, and Distance (2 sets of 10 problems). 🗣️ HR / Soft Skills: Draft a 1-minute self-introduction (update it based on your latest learnings). 💡 Goal: Sharpen consistency and conceptual strength — one small step closer to becoming placement-ready!
To view or add a comment, sign in
-
The most difficult thing for me settling in to modern C++ from a background in C++ 98 is getting familiar with STL algorithms. Here’s a great talk from Jonathan Boccara where he organizes them into a fantasy-style map to make them easier to remember. I have personally been debating whether using these algorithms actually does improve your code’s readability and maintainability (and I hope to make a post about that soon). Regardless, as long as I continue using C++, it’s in my best interest to understand them. https://lnkd.in/ghXgzXsG
CppCon 2018: Jonathan Boccara “105 STL Algorithms in Less Than an Hour”
https://www.youtube.com/
To view or add a comment, sign in
-
SHA-256 Implementation in Bare-Metal C with Raylib GUI I Built a complete SHA-256 cryptographic hash function from scratch in pure C without relying on any external Libraries. The implementation follows the FIPS-180-4 standard and is written in bare-metal style - no standard library dependencies for the core hashing logic, making it suitable for embedded systems and resource-constrained environments. The code handles all the bit manipulation, rotations, and compression functions manually using only basic integer types. To demonstrate and test the implementation I created an Simple GUI using Raylib that allows users to input text and instantly see the computed SHA-256 hash and also verifies it with OpenSSL's output to ensure correctness. Test The code : https://lnkd.in/gu2Bj45J
To view or add a comment, sign in
-
More from this author
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development