Custom Code Optimization Practices

Explore top LinkedIn content from expert professionals.

Summary

Custom code optimization practices help improve the speed, memory use, and reliability of software by tailoring code and system settings to better fit specific workloads and needs. This approach involves analyzing how your software runs, then making targeted changes to boost performance and reduce unnecessary resource consumption.

  • Profile your code: Start by measuring where your program spends the most time or uses the most memory so you can focus improvements where they matter most.
  • Streamline data loading: Design your code to fetch or process only the data needed at the moment, rather than loading everything at once, to prevent slowdowns and crashes.
  • Tune system settings: Adjust memory limits, scheduling options, and the way expensive operations are handled, so your software runs smoothly under heavy or varying loads.
Summarized by AI based on LinkedIn member posts
  • View profile for Sriram Govindan

    ML Systems | GPU Kernel Engineer @Modular | Cofounder @Bench AI

    3,814 followers

    If you're in the AI performance space, you'll see countless blogs about custom GPU kernels that are 10, 30, or even 100% faster than NVIDIA's cuBLAS. Interestingly, the perf gains aren't the crazy part; it's how simple they are to achieve. Performance comes from optimizations, and optimizations fall into two buckets: features and tuning. Feature implementation is what the engineers at Modular (yours truly), NVIDIA, and Hazy Research do. It's coming up with a fusion, memory loading pattern, scheduling technique, etc., and making it accessible. Tuning is what comes after. It's picking the right combination of optimizations and dispatching them based on your workload. This is where we have an edge over cuBLAS. You see, cuBLAS is a generic library; algorithms like matmul need to be performant across a huge range of shapes. As Performance Engineers, we're not trying to support the generic case. We have a specific model in mind (Kimi, Deepseek, …), which uses fixed shapes (head_dim, weight matrix dimensions, …), and we know its dominant workloads (prefill, decode). So we can use this info to make a kernel tailored to our needs. And here's the cool part: in tons of cases, the tailored kernel can be made with tuning alone. No custom GPU code required. All you need is a strong kernel library (Mojo Kernels, Cutlass, ThunderKittens) and the knowledge of when to apply an optimization. To give you a head start, here are a couple of common scenarios and the appropriate optimization: Matmul, Decode: Try SwapAB. The batch size sits in the M dimension, but tensor cores have a fixed M size; this wastes compute. With SwapAB, the A and B matrices switch positions, moving the batch size to the N dimension, where you have finer granularity. The tradeoff is that C needs to be transposed. Prefill: Try a persistent kernel with a CLC scheduler. A large matmul launches lots of blocks across multiple waves; this puts heavy pressure on the block scheduler. Persistent kernels remove that pressure entirely. Each SM persists for the full kernel duration, and the CLC scheduler handles the assignment. My next article will be an extensive list of optimizations and when to use them, so keep your eyes peeled for that. In the meantime, check out these great resources 👇 Vishal Padia's excellent blog on Flash Attention: https://lnkd.in/g9rzN939 Mojo Kernels Matmul(FP4/8/16) config: https://lnkd.in/gPHDV76J

  • View profile for Yamil Garcia

    Tech enthusiast, embedded systems engineer, and passionate educator! I specialize in Embedded C, Python, and C++, focusing on microcontrollers, firmware development, and hardware-software integration.

    13,707 followers

    This article explores practical methods to speed up execution, including inline functions, loop unrolling, bit manipulation, DMA utilization, and data structure optimization. Real-world code examples accompany each technique to illustrate its impact. 

  • View profile for Ramesh babu Thondepu

    Consultant skilled in SAP ABAP, HANA, and RAP

    2,595 followers

    Building efficient SAP Fiori apps and services using the ABAP RESTful Application Programming Model (RAP) in S/4HANA or SAP BTP requires adherence to recent best practices and clean core principles. Here are some concise tips to enhance your development process: **CDS Views Best Practices** - Push filters and aggregations down by defining them directly in CDS interface/projection views to leverage HANA optimization. Avoid complex logic in consumption views. - Use annotations strategically, such as @UI.hidden, @Search.defaultSearchElement, and @Analytics, to improve Fiori rendering and search performance. - Prefer joins over unions when possible, and minimize unnecessary fields to reduce data transfer. - Leverage CDS table entities as active persistence for simpler managed scenarios. **Behavior Definitions and Implementation** - Group all RAP artifacts (tables, CDS views, behavior definitions, service bindings) in the same package for clean core compliance and easier maintenance. - Use managed RAP for boilerplate handling (CRUD, draft, locking) and resort to unmanaged only for complex custom logic. - Implement determinations and validations efficiently by triggering them on modify/save, and use ETags for optimistic locking in draft-enabled business objects. - Set default values via CDS default functions, field controls in behavior, or create determinations for consistent entity creation. **Performance Optimization** - Minimize roundtrips by using $expand wisely in OData queries and side-effects annotations to refresh UI elements automatically. - Optimize custom entities with aggregations and paging in ABAP code. - Profile with SAT/SE30 and ADT tools, focusing on reducing database selects in behavior implementations. - Enable draft for large objects to enhance user experience and reduce locking contention. **Building Efficient Fiori Apps** - Use Fiori Elements (List Report/Object Page) with projection views for rapid development and consistent UI. - Bind services via OData V4 for modern features like multi-dimensional reporting and better offline support. - Exploit RAP generators in ADT to quickly bootstrap multi #SAP #RAP #CDS #FIORI

  • View profile for Harsha Ch

    Salesforce Developer & Admin | PD II | Copado | Service Cloud | Financial Services Cloud | OmniStudio | LWC | Apex | Flows | MuleSoft | REST/SOAP | CI/CD | Driving Efficiency & Automation in Scalable CRM Solutions

    2,937 followers

    A few months ago, a user reached out to me with a simple complaint — “Our dashboard isn’t loading.” What looked like a small issue turned out to be a major performance bottleneck. The dashboard was powered by a data set that fetched every Account along with all its Contacts — thousands of records loaded at once. It worked perfectly in the sandbox with limited data, but in production, it was pulling hundreds of thousands of records each time the dashboard refreshed. The system wasn’t slow — our query design was. We optimized it by: 1️⃣ Using Filters: Retrieved only relevant records instead of everything. 2️⃣ Applying Lazy Loading: Fetched related data only when users actually needed it, not by default. 3️⃣ Creating Indexes: Added selective indexing on key fields to speed up retrieval. After optimization, the same dashboard that once took 40 seconds now loaded in less than 3. That day taught me a valuable lesson: “Performance issues rarely come from the platform — they come from how we design on it.” Since then, whenever I build or review a Flow, report, or Apex process, I remind myself: Don’t just make it work. Make it scale. #Salesforce #Performance #Optimization #TrailblazerCommunity #Apex #FlowBuilder #SalesforceDeveloper #BestPractices

  • View profile for Janhavi Patil

    Data Scientist | Data Engineer | Prior experience at Dentsu | Proficient in SQL, React, Java, Python, and Tableau

    6,728 followers

    With a background in data engineering and business analysis, I’ve consistently seen the immense impact of optimized SQL code on improving the performance and efficiency of database operations. It indirectly contributes to cost savings by reducing resource consumption. Here are some techniques that have proven invaluable in my experience: 1. Index Large Tables: Indexing tables with large datasets (>1,000,000 rows) greatly speeds up searches and enhances query performance. However, be cautious of over-indexing, as excessive indexes can degrade write operations. 2. Select Specific Fields: Choosing specific fields instead of using SELECT * reduces the amount of data transferred and processed, which improves speed and efficiency. 3. Replace Subqueries with Joins: Using joins instead of subqueries in the WHERE clause can improve performance. 4. Use UNION ALL Instead of UNION: UNION ALL is preferable over UNION because it does not involve the overhead of sorting and removing duplicates. 5. Optimize with WHERE Instead of HAVING: Filtering data with WHERE clauses before aggregation operations reduces the workload and speeds up query processing. 6. Utilize INNER JOIN Instead of WHERE for Joins: INNER JOINs help the query optimizer make better execution decisions than complex WHERE conditions. 7. Minimize Use of OR in Joins: Avoiding the OR operator in joins enhances performance by simplifying the conditions and potentially reducing the dataset earlier in the execution process. 8. Use Views: Creating views instead of results that can be accessed faster than recalculating the views each time they are needed. 9. Minimize the Number of Subqueries: Reducing the number of subqueries in your SQL statements can significantly enhance performance by decreasing the complexity of the query execution plan and reducing overhead. 10. Implement Partitioning: Partitioning large tables can improve query performance and manageability by logically dividing them into discrete segments. This allows SQL queries to process only the relevant portions of data. #SQL #DataOptimization #DatabaseManagement #PerformanceTuning #DataEngineering

  • View profile for Theophilus Gordon

    Software Engineer | Java, Spring Boot, Kafka, Spring AI, Angular, Python | AI Integration & LLM Systems

    8,144 followers

    Mastering Code Quality: 12 Key Practices for Efficiency and Reliability 1. Use prettification tools like Prettier to standardize code formatting. 2. Employ linters like SonarLint to catch code smells and potential bugs. 3. Configure precommit hooks with Husky to automate checks before commits. 4. Follow SOLID principles for scalable, maintainable code. 5. Avoid memory leaks by managing resources effectively. 6. Apply design patterns for reusable, structured code. 7. Write unit tests to verify code correctness early. 8. Use dependency injection to reduce tight coupling and improve flexibility. 9. Follow DRY principles to avoid code duplication. 10. Perform code reviews for quality control and knowledge sharing. 11. Optimize code for performance with efficient algorithms and data structures. 12. Implement continuous integration for regular, automated testing and integration. What other practices do you use to ensure clean, efficient, and robust code? Share yours below! #SoftwareDevelopment #CodingBestPractices #CleanCode #SoftwareEngineering #CodeQuality #ProgrammingTips #Tech

  • View profile for Carl-Hugo Marcotte

    Author of Architecting ASP.NET Core Applications: An Atypical Design Patterns Guide for .NET 8, C# 12, and Beyond | Software Craftsman | Principal Architect | .NET/C# | AI

    8,737 followers

    🔥Optimizing performances using C#🔥 Welcome to our series' seventh and final post, where we introduce performance optimization techniques for creating and managing variables. When your code demands high performance, understanding how to optimize memory and reduce overhead becomes crucial. This post explores techniques like stack allocation, pointers, and object pooling, which are invaluable when working on performance-critical use cases. These techniques ensure your code runs efficiently without unnecessary memory allocations. Each technique is advanced enough to be the subject of its own post, so consider this an introduction. 📝 Summary Here are key techniques you can use to optimize performance: - Stack allocation: Use Span<T> and stackalloc to allocate memory on the stack, improving performance compared to heap-allocated memory. - Pointers: Use unsafe pointers to access memory directly for highly optimized code. Be cautious since unsafe pointers bypass the runtime's memory safety features. - Fixed buffers: Inside unsafe code, fixed-size buffers offer control over memory layout by inlining the array with the rest of the struct instead of separately on the heap. - Object pooling: Leverage ArrayPool<T> to reuse arrays, almost negating the cost of array creation. - ref struct: Creates the struct on the stack instead of the heap, avoiding heap allocations and ensuring better performance. - Span<T>: Provides performant access to existing memory blocks (e.g., arrays, stack-allocated, or unmanaged memory) while avoiding unnecessary allocations and garbage collection overhead. - Memory<T>: Similar to Span<T> without the limitations because it can be stored on the heap, making it ideal when memory needs to persist beyond the current frame. - in parameters: Pass a struct by reference without copying it, ensuring read-only access. 💬 Comments What’s your experience with optimizing performance? Have you used any of those techniques before? Share your thoughts in the comments! 📣Any deep dive post you would like to see? Let me know in the comments if there are subjects you'd like to explore in more depth or hear about (it doesn't have to be related to this post). 🔑 Important To allow unsafe code to run, you must explicitly allow it, for example, by adding `<AllowUnsafeBlocks>true</AllowUnsafeBlocks>` in your .csproj file. 🔔Note You may never need any of those techniques, yet it's essential to know they exist for the day you do! #CSharp #dotnet #ProgrammingTips #SoftwareDevelopment #CodeOptimization #HighPerformance #PerformanceOptimization #LearnCSharp #AdvancedOptimization

  • View profile for Raydelto Hernandez

    Computer Scientist | Engineer | ex-Google

    6,714 followers

    To all C++ developers interested in high-performance software, I highly recommend reading the paper recently published by Meta Research titled "Automated Hot Text and Huge Pages: An Easy-to-adopt Solution Towards High Performing Services." Key takeaways: A. Many of the largest-scale backend infrastructures in the world are written in C/C++ (e.g., Facebook, Google, Microsoft). B. In large-scale infrastructures, even small performance improvements are significant. For instance, a service running across 100,000 servers can achieve substantial savings: just a 1% performance optimization could translate to using a thousand fewer servers. C. The optimization pipeline proposed in the paper consists of three main steps: 1. Profiling the binary to identify how frequently each function is called. Sorting the functions by usage frequency, with the most frequently accessed functions first. 2. Optimizing the function layout during the linking process. 3. Once we have the optimized binary, we can place the most frequently used section (referred to as "hot text" in the paper) onto huge pages of virtual memory. Separately isolating the most frequently executed code sections and placing them on huge pages each provide performance benefits, but combining both techniques yields the best optimization results. Meta Facebook developed a pipeline to automate this entire process, making their solution easy to adopt and virtually maintenance-free. You can access the full paper here: https://lnkd.in/enZCFtwj

Explore categories