Last week, I described four design patterns for AI agentic workflows that I believe will drive significant progress: Reflection, Tool use, Planning and Multi-agent collaboration. Instead of having an LLM generate its final output directly, an agentic workflow prompts the LLM multiple times, giving it opportunities to build step by step to higher-quality output. Here, I'd like to discuss Reflection. It's relatively quick to implement, and I've seen it lead to surprising performance gains. You may have had the experience of prompting ChatGPT/Claude/Gemini, receiving unsatisfactory output, delivering critical feedback to help the LLM improve its response, and then getting a better response. What if you automate the step of delivering critical feedback, so the model automatically criticizes its own output and improves its response? This is the crux of Reflection. Take the task of asking an LLM to write code. We can prompt it to generate the desired code directly to carry out some task X. Then, we can prompt it to reflect on its own output, perhaps as follows: Here’s code intended for task X: [previously generated code] Check the code carefully for correctness, style, and efficiency, and give constructive criticism for how to improve it. Sometimes this causes the LLM to spot problems and come up with constructive suggestions. Next, we can prompt the LLM with context including (i) the previously generated code and (ii) the constructive feedback, and ask it to use the feedback to rewrite the code. This can lead to a better response. Repeating the criticism/rewrite process might yield further improvements. This self-reflection process allows the LLM to spot gaps and improve its output on a variety of tasks including producing code, writing text, and answering questions. And we can go beyond self-reflection by giving the LLM tools that help evaluate its output; for example, running its code through a few unit tests to check whether it generates correct results on test cases or searching the web to double-check text output. Then it can reflect on any errors it found and come up with ideas for improvement. Further, we can implement Reflection using a multi-agent framework. I've found it convenient to create two agents, one prompted to generate good outputs and the other prompted to give constructive criticism of the first agent's output. The resulting discussion between the two agents leads to improved responses. Reflection is a relatively basic type of agentic workflow, but I've been delighted by how much it improved my applications’ results. If you’re interested in learning more about reflection, I recommend: - Self-Refine: Iterative Refinement with Self-Feedback, by Madaan et al. (2023) - Reflexion: Language Agents with Verbal Reinforcement Learning, by Shinn et al. (2023) - CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing, by Gou et al. (2024) [Original text: https://lnkd.in/g4bTuWtU ]
Performance Optimization Techniques
Explore top LinkedIn content from expert professionals.
-
-
I've recently suffered a major career setback. Since I teach about high performance and career growth, I want to share how I am addressing it. One day you will need this recipe yourself! My goal in my current "career" is to reach as many people as I can, and to help them achieve career success and satisfaction. For the last three years, the way to do this has been through LinkedIn. Unfortunately, LinkedIn recently made some unknown changes to their algorithm. Other Top Voices and I have noticed a drop of 70% to 80% in the reach of our posts. Since my goal is to share my knowledge with more people, that means my goal just took an 80% hit. In general, setbacks in performance are either due to: A) Something we did Or B) Something external, outside our direct control Mistakes, poor decisions, and missed deadlines are examples of A. They are in our control. Things like Covid, high interest rates, and reorganizations at work are examples of B, outside our control. LinkedIn's change is also case B, outside my control. When a setback comes from something in your control, you know clearly what you did wrong and what you need to change to restore your performance and progress. Fixing your own issues may take time and be difficult, but you know what to do. When the setback is due to something outside your control, you do not know how to fix the issue. So, how can we react when our performance is shattered and we do not know why? Here is my recipe: 1. Allow yourself a fixed amount of time to grieve (and complain if you wish). Emotions are real, and before you can move on you will need to sit with those emotions. But, do not get stuck in them. Curse your bad luck, pout for a minute, etc. Then, move to the next step. 2. Refocus on your core value. Whatever happened, go back to how you define high performance to ensure it is still relevant. I admit, I slipped into defining my own performance by how many people viewed my LinkedIn posts. This was a mistake. My mission is to help others, so getting views is a proxy, not a result. And, using LinkedIn is just a method for the mission, not the mission itself. 3. Adapt your core value if you must (if its value has decreased). In my case, the value of what I offer hasn't changed, the external delivery system has. 4. Once you adapt and/or increase your value, find new ways to deliver it if necessary. Luckily, I have other options for reaching people: my Substack newsletter, YouTube, etc. Since Substack has been such a good partner recently, I will start there. I have also refocused how I write on LinkedIn to make every post focused on my goal. 5. Test, measure, adapt, repeat! Really, this step is everything. Once you get past the grief, jump into action in this loop. Nothing can stop you if you keep working to refine, deliver, and showcase your core value. Comments? Here's my newsletter, which is my next area of investment: https://lnkd.in/gXh2pdK2
-
Many of us write SQL queries daily, but how often do we consider the underlying execution order? Understanding each step can be a game-changer for optimizing query performance and getting accurate results. Here’s a detailed walkthrough of SQL’s execution flow: 𝟭. 𝗙𝗥𝗢𝗠 𝗖𝗹𝗮𝘂𝘀𝗲: 𝗧𝗵𝗲 𝗦𝘁𝗮𝗿𝘁𝗶𝗻𝗴 𝗟𝗶𝗻𝗲 - Role: Establishes the data sources (tables, views, or joins) your query will work with. - Why It Matters: The FROM clause is where it all begins. Selecting the right sources and structuring joins here determines the query’s foundation and efficiency. 𝟮. 𝗪𝗛𝗘𝗥𝗘 𝗖𝗹𝗮𝘂𝘀𝗲: 𝗧𝗵𝗲 𝗙𝗶𝗹𝘁𝗲𝗿 𝗚𝗮𝘁𝗲 - Role: Applies conditions to remove rows that don’t meet specified criteria. - Why It Matters: Filtering data at this stage reduces the load for subsequent steps, saving processing time and ensuring only relevant data proceeds. 𝟯. 𝗚𝗥𝗢𝗨𝗣 𝗕𝗬 & 𝗔𝗴𝗴𝗿𝗲𝗴𝗮𝘁𝗶𝗼𝗻 (𝗢𝗽𝘁𝗶𝗼𝗻𝗮𝗹): 𝗖𝗮𝘁𝗲𝗴𝗼𝗿𝗶𝘇𝗶𝗻𝗴 𝗗𝗮𝘁𝗮 - GROUP BY: Clusters rows by specified columns, transforming raw data into grouped sets. - Aggregate Functions (e.g., SUM, COUNT): Summarize each group’s data, converting details into insights. - HAVING Clause: Filters these groups based on aggregate results. - Why It Matters: Using GROUP BY and aggregation effectively is essential for summary reports. This step is powerful for analytics but can be resource-intensive if misused. 𝟰. 𝗦𝗘𝗟𝗘𝗖𝗧 𝗖𝗹𝗮𝘂𝘀𝗲: 𝗖𝗵𝗼𝗼𝘀𝗶𝗻𝗴 𝘁𝗵𝗲 𝗥𝗲𝘀𝘂𝗹𝘁𝘀 - Role: Specifies which columns or expressions appear in the final output. - Did You Know? The SELECT clause runs after WHERE and GROUP BY, meaning you’re selecting columns from an already-filtered and grouped dataset. - Why It Matters: This ensures that only the necessary columns make it to the final result, making the query efficient and clear. 𝟱. 𝗢𝗥𝗗𝗘𝗥 𝗕𝗬 & 𝗟𝗜𝗠𝗜𝗧: 𝗥𝗲𝗳𝗶𝗻𝗶𝗻𝗴 𝘁𝗵𝗲 𝗢𝘂𝘁𝗽𝘂𝘁 - ORDER BY: Sorts the results based on one or more columns, ideal for ordered reports and prioritized data. - LIMIT: Caps the number of returned rows, especially useful for large datasets. - Why It Matters: Ordering and limiting focus the output for user readability and system efficiency, especially when dealing with large datasets. Why Execution Order is Essential Knowing SQL’s execution sequence helps you: - 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗲 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲: Each step can be streamlined to make queries faster and more responsive. - 𝗧𝗿𝗼𝘂𝗯𝗹𝗲𝘀𝗵𝗼𝗼𝘁 𝗜𝘀𝘀𝘂𝗲𝘀 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁𝗹𝘆: By understanding the order, you can pinpoint issues at specific steps. - 𝗥𝗲𝗱𝘂𝗰𝗲 𝗥𝗲𝘀𝗼𝘂𝗿𝗰𝗲 𝗨𝘀𝗮𝗴𝗲: Targeted optimization in each clause saves both time and computational power. 𝗣𝗿𝗼 𝗧𝗶𝗽: Different SQL dialects (MySQL, SQL Server, Oracle) can vary in execution quirks, so always refer to your database documentation for precise optimization techniques. What’s your top SQL tip for query performance? 👇
-
My next tutorial on pretraining an LLM from scratch is now out. It starts with a step-by-step walkthrough of understanding, calculating, and optimizing the loss. After training, we update the text generation function with temperature scaling and top-k sampling. And finally, we also load openly available pretrained weights into our scratch-built model architecture. Along with this pretraining tutorial, I also have bonus material on speeding up the LLM training. These apply not just to LLMs but also to other transformer-based models like vision transformers: 1. Instead of saving the causal mask, this creates the causal mask on the fly to reduce memory usage (here it has minimal effect, but it can add up in long-context size models like Llama 3.2 with 131k-input-tokens support) 2. Use tensor cores (only works for Ampere GPUs like A100 and newer) 3. Use the fused CUDA kernels for `AdamW` by setting 4. Pre-allocate and re-use GPU memory via the pinned memory setting in the data loader 5. Switch from 32-bit float to 16-bit brain float (bfloat16) precision 6. Replace from-scratch implementations of attention mechanisms, layer normalizations, and activation functions with PyTorch counterparts that have optimized CUDA kernels 7. Use FlashAttention for more efficient memory read and write operations 8. Compile the model 9. Optimize the vocabulary size 10. After saving memory with the steps above, increase the batch size Video tutorial: https://lnkd.in/gDRycWea PyTorch speed-ups: https://lnkd.in/gChvGCJH
-
E-commerce logistics during peak season is a complex and challenging operation. Here's an overview: Thumb rule - Fast,safe & on time delivery with minimum price operation ,one has to follow to meet the customer satisfaction in all aspects. Peak Season Logistics Challenges: 1. Increased volume (millions of packages per day) 2. Time-sensitive delivery demands 3. Higher customer expectations 4. Limited capacity and resources 5. Supply chain disruptions 6. Weather-related issues 7. Labor shortages 8. Technology and infrastructure constraints Strategies to Meet On-Time Delivery Demands: 1. Scalable Infrastructure: Temporary warehouses, pop-up distribution centers 2. Flexible Workforce: Seasonal hiring, overtime, and flexible scheduling 3. Technology Integration: Automated sorting, tracking, and delivery systems 4. Data Analytics: Predictive modeling, real-time monitoring, and optimization 5. Partnerships and Collaborations*: Carrier partnerships, last-mile delivery networks 6. Dynamic Routing: Real-time route optimization, traffic management 7. Inventory Management: Strategic inventory placement, pre-season stocking 8. Customer Communication: Proactive updates, transparent tracking Best Practices: 1. Pre-Season Planning: Forecasting, capacity planning, and resource allocation 2. Real-Time Visibility: End-to-end tracking, monitoring, and alerts 3. Proactive Issue Resolution: Quick response to delays, exceptions 4. Carrier Diversification: Multiple carrier partnerships for contingency 5. Contingency Planning: Backup plans for unexpected disruptions Innovative Solutions: 1. Drone Delivery: Last-mile delivery acceleration 2. Autonomous Vehicles: Self-driving delivery trucks 3. Robotics and Automation: Warehouse automation, sorting 4. Artificial Intelligence: Predictive analytics, optimized routing 5. Internet of Things (IoT): Real-time tracking, monitoring Key Performance Indicators (KPIs): 1. On-time delivery rate 2. Order fulfillment rate 3. Shipping accuracy 4. Customer satisfaction (CSAT) 5. Return rate 6. Cost per shipment 7. Transit time 8. Supply chain visibility Few major E-commerce Logistics Players: 1. Amazon Logistics 2. UPS 3. FedEx 4. DHL 5. USPS 6. JD Logistics 7. Alibaba Logistics 8. Shopify Logistics 9.Flipkart logistics 10.Delhivery.com. Peak Season Logistics Timeline: 1. Pre-season (July-August): Planning, forecasting, resource allocation 2. Peak season (November-December): Increased volume, expedited shipping 3. Post-peak (January-February): Returns, inventory management By implementing strategies, e-commerce companies can ensure timely delivery and meet customer expectations during peak season.
-
𝗘𝘅𝗽𝗹𝗮𝗶𝗻 𝗧𝗵𝗶𝘀: 𝗟𝗹𝗮𝗺𝗮 𝟯 𝗡𝗲𝗲𝗱𝘀 𝟮.𝟰𝗧𝗕. 𝗬𝗼𝘂𝗿 𝗚𝗣𝗨 𝗛𝗮𝘀 𝟴𝟬𝗚𝗕. 𝗜𝘁 𝗦𝘁𝗶𝗹𝗹 𝗧𝗿𝗮𝗶𝗻𝘀. Training Llama-3 405B needs ~2.4TB with BF16 + 8-bit Adam: • Weights: 810GB • Gradients: 810GB • Optimizer: 810GB (vs 3.24TB with standard Adam!) • Total: ~2.4TB (Illustrative budget—config-dependent; FP32 masters, ZeRO stage, and offload change totals) Your H100? 80GB. You'd need 30+ GPUs just to hold everything. 𝗧𝗵𝗿𝗲𝗲 𝗧𝗿𝗶𝗰𝗸𝘀 𝗧𝗵𝗮𝘁 𝗠𝗮𝗸𝗲 𝗜𝘁 𝗪𝗼𝗿𝗸 𝟭. 𝗗𝗮𝘁𝗮 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹: Split batch. Problem: Each GPU needs 2.4TB. Fix: ZeRO splits it across N GPUs. 𝟮. 𝗠𝗼𝗱𝗲𝗹 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹: Split layers. Problem: Sequential bottleneck. Fix: Pipeline batches. 𝟯. 𝗦𝗲𝗾𝘂𝗲𝗻𝗰𝗲 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹: Split tokens. This is the game changer. 8K tokens → 8 GPUs → 1K each. But attention needs every token to see all others. 𝗧𝗵𝗲 𝗠𝗮𝗴𝗶𝗰 𝗠𝗼𝗺𝗲𝗻𝘁: Instead of moving the 2.4TB model, GPUs only exchange attention keys/values (K,V). Each GPU: • Computes K,V for its 1K tokens (32MB) • Sends to others via all-to-all • Receives 7×32MB = 224MB total • Computes attention, deletes copies 𝟮𝟮𝟰𝗠𝗕 𝗺𝗼𝘃𝗲𝗱 𝗶𝗻𝘀𝘁𝗲𝗮𝗱 𝗼𝗳 𝟮.𝟰𝗧𝗕. That's 10,000x less. 𝗧𝗵𝗲 𝗥𝗲𝘀𝘂𝗹𝘁: Combine all three (ZeRO + tensor + pipeline + sequence parallel). Each GPU holds ~75GB instead of 2.4TB. This exact choreography powers ChatGPT, Claude, and every frontier model. Without it? 10K token limits. With it? Entire books in one context. Not magic. Just brilliant engineering making the impossible routine.
-
DMAIC–KEY TOOLS AND FORMATS: 1. DEFINE Goal: Define the problem, project goals, and scope. Key Activities: Create a Project Charter Identify Voice of Customer (VOC) Define CTQs (Critical to Quality elements) Create SIPOC Diagram (Suppliers, Inputs, Process, Outputs, Customers) Tools & Formats: SIPOC diagram Project Charter Problem Statement Goal Statement VOC Analysis Stakeholder Analysis Example: Problem: Customers unhappy with 5-day delivery time Goal: Reduce delivery time to 3 days Scope: Only domestic shipping, not international 2. MEASURE Goal: Understand the current performance and gather baseline data. Key Activities: Identify key performance indicators (KPIs) Collect data on process performance Validate measurement system (MSA) Develop data collection plan Tools & Formats: Data Collection Plan Control Charts Process Flow Diagrams Measurement System Analysis (MSA) Histogram, Run Charts Example: Measured average delivery time = 5 days 20% orders delayed beyond promised date 3. ANALYZE Goal: Identify root causes of the problem using data analysis. Key Activities: Analyze collected data Identify patterns, variations, and causes Validate root causes Tools & Formats: Root Cause Analysis (5 Whys) Fishbone Diagram (Ishikawa) Pareto Chart (80/20 rule) Regression Analysis Cause and Effect Matrix Scatter Plot Example: Found issues: Poor inventory control Manual order entry Departmental miscommunication 4. IMPROVE Goal: Implement and test solutions to eliminate root causes. Key Activities: Brainstorm improvement ideas Conduct pilot tests Implement best solutions Assess risk (FMEA) Tools & Formats: Brainstorming Sessions FMEA (Failure Mode and Effects Analysis) Poka-Yoke (Error Proofing) DOE (Design of Experiments) Process Simulation Before & After Comparisons Example: Actions taken: Automated inventory system Integrated order tracking Real-time communication tools Result: Delivery time reduced to 3.5 days 5. CONTROL Goal: Sustain improvements and monitor long-term performance. Key Activities: Develop control plans Standardize improved processes Monitor KPIs Provide training and documentation Tools & Formats: Control Charts Control Plan Document Standard Operating Procedures (SOPs) Process Audit Checklists Visual Management Tools (dashboards) Example: Monthly delivery performance review Dashboard showing real-time shipment status Staff trained on new SOPs
-
💥 Data Engineer Interview Killer: Handling 500GB Daily with PySpark Data pros — have you ever been asked this in an interview? 👉 “How would you efficiently process a 500 GB dataset in PySpark, and how would you size your cluster?” It’s one of my favorite questions — because it blends architecture, optimization, and cost awareness into one real-world scenario. Here’s how I’d break it down 👇 💡 The 5-Step Optimization Blueprint 1️⃣ Format First — The Foundation of Speed 🚀 Action: Convert raw data (CSV/JSON) into Parquet or Delta Lake right away. Why: Columnar storage, compression, and predicate pushdown drastically cut I/O. 👉 This single step often gives the biggest performance boost. 2️⃣ Partitioning Math — Define Your Parallelism 🧮 Each Spark task should process around 128 MB. Calculation: 500 GB × 1024 MB/GB ÷ 128 MB/partition ≈ 4,000 partitions ➡️ Spark now has ~4,000 tasks to parallelize — perfect for scaling efficiently. 3️⃣ Cluster Sizing — Predictable Execution 🧠 Let’s assume: 10 worker nodes 8 cores & 32 GB RAM per node Parallelism: 10 nodes × 8 cores = 80 cores total Each core handles ~2–3 tasks → ~240 tasks concurrently Total time: 4,000 ÷ 240 ≈ 17 waves of execution At ~1–2 min per wave → ~25–30 minutes total runtime That’s how you explain both scaling and efficiency in an interview. 4️⃣ Memory Management — Avoid the Spill 💾 Plan for roughly 3× data size during joins and shuffles. Estimate: (500 GB × 3) ÷ 10 nodes = 150 GB per node With only 32 GB per node, Spark will spill to disk — which is fine if SSD-backed. For critical workloads, upgrade to 64 GB nodes to keep processing smooth. 5️⃣ Performance Tweaks — Fine-Tuning ⚙️ spark.sql.shuffle.partitions = 400 spark.sql.adaptive.enabled = True ✅ Use Broadcast Joins for small lookup tables. ✅ Implement Incremental Loads (Delta Lake makes this easy). ✅ Avoid full reloads — only process what’s changed. 🧭 The Real Data Engineering Challenge Optimizing Spark isn’t about adding more compute — it’s about finding the sweet spot between performance, cost, and scalability. 🔥 Question for you: If you got this same question in an interview — how would you size your cluster or optimize it differently? 👇 I’ll be sharing my cost–benefit breakdown in the next post — how to choose between scaling up vs scaling out for real workloads. #PySpark #ApacheSpark #Databricks #BigData #DataEngineering #Optimization #InterviewPrep #Azure
-
At this stage, I believe most businesses are using metrics of some sort. So the biggest problems with metrics today is not that they are not used, it's that the wrong ones are used. Or there are just too many. Companies are often unaware they are using the wrong metrics. This usually happens when they are either copying what others are doing because it sounds like something they "should" be doing, or they lack clarity about what's really important to their growth. The other problem I mentioned was the use of too many metrics. It's really not necessary to measure everything! Collecting and analyzing huge amounts of data can create decision paralysis and make it difficult to focus on what really matters. Instead of helping, it can slow down decision-making. There IS a simple solution. It starts with focusing on identifying areas that matter most to your growth. 1️⃣ Begin by defining your top business goals. Ask, "What do we want to achieve?" Whether it’s increasing customer retention or improving operational efficiency, your metrics should directly support these goals. 2️⃣ Avoid overload by choosing only 3–5 core metrics that are critical to your goals. For example, track Net Promoter Score for customer satisfaction, or Cycle Time for operational efficiency. 3️⃣ Implement tools to automate the tracking of these metrics, so you can easily monitor progress without manually crunching numbers. This saves time and ensures real-time data. 4️⃣ Set up a routine to review the data—weekly or monthly. Look for trends and areas of improvement, and adjust your actions based on the insights gained. 5️⃣ Make sure your team understands the importance of these metrics and how they can contribute to improving them. This helps ensure accountability and alignment across the organization. Do you have any tips for effective metric management? What works in your organization? Leave your comments below 🙏 #measurewhatmatters #metrics #leadership #datamanagement #continuousimprovement
-
Study: Generators May Provide a Faster Path to Power A new study by energy researchers suggests that data centers could get faster access to power by adopting load flexibility, agreeing to briefly curtail utility usage and shift to generator power. In an in-depth analysis of the U.S. power grid, researchers at Duke University estimate that this approach could tap existing headroom in the system to more quickly integrate at least 76 gigawatts of new loads, arguing that even a small reduction in peak demand could reduce the need for new investments in transmission and generation capacity - as well as the need to pass on those investments to ratepayers. Data centers are all about uptime, and thus have been resistant to innovations that create additional risk around reliability. But current power constraints in key markets, along with growing demand for AI training workloads (which may be more interruptible than cloud or colocation) has prompted the industry to explore load flexibility options. Last year the Electric Power Research Institute (EPRI) launched the DCFlex project to work with utilities and a number of data center operators - including Compass Datacenters, QTS Data Centers, Google and Meta - on pilot projects for load flexibility. The Duke study, titled "Rethinking Load Growth," puts some interesting numbers on the upside potential. Their findings: - 76 gigawatts of new load could be enabled by a annual load curtailment rate of 0.25% of maximum uptime, equivalent to 1.7 hours per year operating on backup generators. - An annual curtailment rate of 0.5% (2.1 hours annually) could enable 98 GWs of new load, while a rate of 1.0% (2.5 hours) could boost that to 126 GWs. - A 0.5% curtailment could enable 18GWs in the PJM and 10 GWs in ERCOT, the research finds. At least one hyperscaler seems open to the idea. “This is a promising tool for managing large new energy loads without adding new generating capacity and should be part of every conversation about load growth,” said Michael Terrell, Senior Director of Clean Energy and Carbon Reduction at Google, in a LinkedIn post. With the acceleration of the AI arms race, speed-to-market is now a top priority, along with a competitive opportunity cost for companies that are unable to deploy new capacity. There are tradeoffs to consider (including more emissions), but the Duke paper will likely advance the conversation. Duke study: https://lnkd.in/eS3s_pvk Background on DCFlex: https://lnkd.in/euK746Zy
Explore categories
- Hospitality & Tourism
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development