I see Finance teams spending days and weeks building Excel forecasts that break the moment business patterns shift. There's a better way. I just published a walkthrough showing how to implement 𝗠𝗟-𝗯𝗮𝘀𝗲𝗱 𝗳𝗼𝗿𝗲𝗰𝗮𝘀𝘁𝗶𝗻𝗴 𝗶𝗻 𝗠𝗶𝗰𝗿𝗼𝘀𝗼𝗳𝘁 𝗙𝗮𝗯𝗿𝗶𝗰 - achieving >95% accuracy with a setup that takes hours, not the days/weeks Excel requires. Once configured in Fabric notebooks, forecasts refresh automatically. No more monthly Excel gymnastics. CFOs get conservative/baseline/stretch scenarios from the same model. And it adapts to trend changes without manual recalibration. The approach works beyond AR (Accounts Receivable) - I've used similar frameworks for sales forecasting, inventory planning, and capacity projections across Telco, Oil & Gas, and Pharma clients. 𝗪𝗵𝗮𝘁 𝘁𝗵𝗲 𝘁𝘂𝘁𝗼𝗿𝗶𝗮𝗹 𝗰𝗼𝘃𝗲𝗿𝘀: • Prophet framework for automatic seasonality detection • 12-month cash flow predictions with confidence intervals for scenario planning • Lakehouse integration for automatic Power BI refresh • Cross-validation workflow that tunes parameters automatically 𝗥𝗲𝗮𝗹 𝗮𝗰𝗰𝘂𝗿𝗮𝗰𝘆 𝗺𝗲𝘁𝗿𝗶𝗰𝘀: With the sample data I was able to achieve 3% MAPE (Mean Absolute Percentage Error) - that's $50K average variance on $1.5M monthly collections. Industry target is under 5%. 𝗧𝘂𝘁𝗼𝗿𝗶𝗮𝗹 𝗮𝗻𝗱 𝗻𝗼𝘁𝗲𝗯𝗼𝗼𝗸 𝗹𝗶𝗻𝗸 𝗶𝗻 𝗰𝗼𝗺𝗺𝗲𝗻𝘁𝘀 🎥👇 ____ #MicrosoftFabric #PowerBI #MachineLearning #DataAnalytics #Forecasting
How to Improve Predictive Accuracy
Explore top LinkedIn content from expert professionals.
Summary
Improving predictive accuracy means using methods and tools to make predictions—like forecasts or recommendations—that closely match real-world outcomes. It involves refining how models are trained, how data is handled, and how results are interpreted so that predictions are more reliable for business decisions.
- Understand your data: Spend time learning where your data comes from and how it's constructed to avoid building predictions on shaky or misunderstood information.
- Use smarter prompts: When working with language models, experiment with prompts that guide the system to focus on relevant details and encourage more accurate responses.
- Try test-time augmentation: In recommendation systems, boost accuracy by modifying inputs during prediction without retraining, using methods like masking or adding controlled noise to keep important patterns intact.
-
-
Training LLMs for spam classification: I added 14 experiments comparing different approaches: https://lnkd.in/gTNVvGcj - which token to train - which layers to train - different model sizes - LoRA - unmasking - and more! Any additional experiments you'd like to see? And here are the take aways for the table shown in the picture: 1. Training the Last vs. First Output Token (Row 1 vs. 2): Training the last output token results in substantially better performance compared to the first. This improvement is expected due to the causal self-attention mask. 2. Training the Last Transformer Block vs. Last Layer (Row 1 vs. 3): Training the entire last transformer block is also results in substantially better results than training only the last layer. 3. Training All Layers vs. Last Transformer Block (Row 1 vs. 4): Training all layers shows a modest improvement of ~2% over just training the last transformer block, but it requires almost three times longer in terms of training duration. 4. Using Larger Pretrained Models (Row 1 vs 5, and Row 1 vs. 6 and 7): Employing a 3x larger pretrained model leads to worse results. However, using a 5x larger model improves performance compared to the initial model, as was anticipated. Similarly, the 12x larger model improves the predictive performance even further. (The medium model was perhaps not well pretrained or the particular finetuning configuration works not as well for this model.) 5. Using a Model with Random Weights vs. Pretrained Weights (Row 1 vs. 8): Utilizing a model with random weights yields results that are only slightly worse by 1.3% compared to using pretrained weights. 6. Using LoRA (Low-Rank Adaptation) vs Training All Layers (Row 9 vs. 4): Keeping the model frozen and adding trainable LoRA layers (see Appendix E for details) is a viable alternative to training all model parameters and even improves the performance by 1% point. As it can be seen by the 1% lower gap between the training and validation accuracy when using LoRA, this is likely due to less overfitting. 7. Padding Input to Full Context Length vs. Longest Training Example (Row 1 vs. 10): Padding the input to the full supported context length results is significantly worse. 8. Padding vs no padding (Row 1 vs. 11 and 12): The `--no_padding` option disables the padding in the dataset, which requires training the model with a batch size of 1 since the inputs have variable lengths. This results in a better test accuracy but takes longer to train. In row 12, we additionally enable gradient accumulation with 8 steps to achieve the same batch size as in the other experiments. 9. Disabling the causal attention mask (Row 1 vs. 13): Disables the causal attention mask used in the multi-head attention module. This means all tokens can attend all other tokens. The model accuracy is slightly improved compared to the GPT model with causal mask.
-
In the last three months alone, over ten papers outlining novel prompting techniques were published, boosting LLMs’ performance by a substantial margin. Two weeks ago, a groundbreaking paper from Microsoft demonstrated how a well-prompted GPT-4 outperforms Google’s Med-PaLM 2, a specialized medical model, solely through sophisticated prompting techniques. Yet, while our X and LinkedIn feeds buzz with ‘secret prompting tips’, a definitive, research-backed guide aggregating these advanced prompting strategies is hard to come by. This gap prevents LLM developers and everyday users from harnessing these novel frameworks to enhance performance and achieve more accurate results. https://lnkd.in/g7_6eP6y In this AI Tidbits Deep Dive, I outline six of the best and recent prompting methods: (1) EmotionPrompt - inspired by human psychology, this method utilizes emotional stimuli in prompts to gain performance enhancements (2) Optimization by PROmpting (OPRO) - a DeepMind innovation that refines prompts automatically, surpassing human-crafted ones. This paper discovered the “Take a deep breath” instruction that improved LLMs’ performance by 9%. (3) Chain-of-Verification (CoVe) - Meta's novel four-step prompting process that drastically reduces hallucinations and improves factual accuracy (4) System 2 Attention (S2A) - also from Meta, a prompting method that filters out irrelevant details prior to querying the LLM (5) Step-Back Prompting - encouraging LLMs to abstract queries for enhanced reasoning (6) Rephrase and Respond (RaR) - UCLA's method that lets LLMs rephrase queries for better comprehension and response accuracy Understanding the spectrum of available prompting strategies and how to apply them in your app can mean the difference between a production-ready app and a nascent project with untapped potential. Full blog post https://lnkd.in/g7_6eP6y
-
Forecasting is always off. At least a bit. But here’s how you make it less off: Whether I’m building my own forecast or sitting down with a client, there’s one non-negotiable starting point: ‼️ Show me the data. Not your “Sheet 1, Sheet 2, Sheet 78,” or a screenshot from your WFM tool. Show me the raw data and where it comes from. Why does this matter? Because if you don’t understand what you’re working with, you’re building your forecast on guesswork. There should never be situations like: • “I get this report from the BAs.” • “I use out-of-the-box reports, so they must be correct.” • “Let me check with X whether this is included.” If you don’t know how your data is constructed, you’re already off track. Remember: default reports rely on assumptions that might not fit your reality. ✅ Example: One company’s ‘short abandoned’ calls might be 10 seconds. Another’s might be under 20. ✅ Example: Telephony structures change constantly (queues get merged, new queues pop up) and often the WFM team is left out of the loop. ✅ Example: Calls in transfer queues: are they counted once or twice? How does that impact AHT? Where does that time get attributed? These details can make or break your forecast. If you want to be great at forecasting, you need to truly understand your data and your data sources. Get OBSESSED with knowing your data. That’s how you go from “way off” to “close enough to drive results” What’s the biggest data blind spot you’ve seen ruin a forecast?👀 #forecasting #workforcemanagement
-
Exciting Research Alert: Test-Time Augmentation for Sequential Recommendation I just came across a fascinating paper from researchers at Northeastern University titled "Data Augmentation as Free Lunch: Exploring the Test-Time Augmentation for Sequential Recommendation." This innovative approach tackles a common challenge in recommendation systems - deploying data augmentation techniques without the high computational costs of retraining models. >> What's the innovation? Unlike traditional data augmentation methods that require retraining models (a time-consuming and costly process), this research explores Test-Time Augmentation (TTA) for sequential recommendation. TTA augments inputs during model inference and aggregates predictions from these augmented versions to improve accuracy - essentially offering performance gains as a "free lunch"! >> How it works under the hood: The researchers first evaluated existing augmentation operators (Mask, Substitute, Crop, Reorder, etc.) for TTA effectiveness. They discovered that Mask and Substitute consistently outperform other methods because they introduce appropriate perturbations while preserving original sequential patterns. Through detailed analysis, they found that effective TTA methods must: - Maintain the original sequential pattern - Add appropriate perturbations (not too similar, not too different) - Avoid disrupting key user interactions Based on these insights and to overcome limitations of existing methods, they proposed two novel operators: 1. TNoise: Injects uniform noise directly into sequence representations, avoiding the computational overhead of item selection while introducing appropriate perturbations. 2. TMask: Comes in two variants - TMask-B blocks mask tokens from participating in model calculations, while TMask-R removes interactions that would have been replaced with mask tokens. >> The results speak for themselves: Comprehensive experiments across multiple datasets (Amazon Beauty, Sports, Home, and Yelp) and various sequential recommendation models (GRU4Rec, SASRec, NARM, NextItNet, LightSANs, FMLP-Rec) demonstrated: - TMask-R consistently achieved the best performance, with improvements of up to 73% over base models - Significant efficiency advantages - no retraining required and minimal inference time increase - Strong generalizability across different model architectures This approach offers a practical solution for recommendation system engineers looking to improve model performance without the computational burden of retraining or architecture modifications. The research from Northeastern University represents an important step forward in making recommendation systems more efficient and effective. If you're working in this space, this paper is definitely worth checking out!
-
Demand forecasting errors silently bleed profits and cash. This document shows 7 red flags in demand forecasting and how to fix them: 1️⃣ Over-reliance on historical data ↳ How to fix: incorporate external data like market trends, competitor activity, and consumer sentiment to enrich forecasts 2️⃣ Ignoring promotions and discounts ↳ How to fix: build a promotions-adjusted forecasting model, considering historical uplift from similar campaigns 3️⃣ Forgetting cannibalization effects ↳ How to fix: model cannibalization effects to adjust forecasts for existing products 4️⃣ One-size-fits-all forecasting method ↳ How to fix: use demand segmentation (for example, high variability vs. stable demand); do not treat all SKUs equally 5️⃣ Not monitoring forecast accuracy ↳ How to Fix: track metrics like MAPE, WMAPE, bias, to improve over time 6️⃣ High forecast error with no accountability ↳ How to fix: tie accountability to S&OP (sales and operations) meetings 7️⃣ Past sales (instead of demand) consideration ↳ How to fix: make the initial predictions based on the unconstrained demand; not on sales that are impacted by cuts and out of stock situations Any others to add?
-
A few months back, I interviewed a senior demand planner from a global skincare brand. I asked a simple question: "How do you improve your forecast when the system gives you a number that feels... off?" She replied, "We talk to the right people before we talk to the system." That line stayed with me. In Demand Planning, we often focus heavily on historical data, statistical models, and software outputs. But what truly differentiates an average forecast from a high-confidence, actionable one - is the process of Demand Enrichment. And no, it’s not just a buzzword. It’s a discipline - a method of adding intelligence beyond what the system predicts. In fact, according to a McKinsey study, companies that effectively integrate enriched demand signals (like promotions, competitor moves, distribution expansion, influencer campaigns, and even climate effects) can improve forecast accuracy by up to 25%. When I worked for a consumer brand in North India, we noticed our system forecast underestimated demand by 18% during Q4. Why? Because it didn’t factor in the impact of a regional festival that doubled store footfall across 3 key states. Our statistical model was flawless. But our insights were incomplete. That’s when we built a cross-functional "Demand Intelligence Loop" - gathering inputs from marketing, sales, trade partners, and retailers - and feeding it back into planning. The result? Forecast accuracy jumped. Inventory positioning improved. And stockouts during peak weeks were cut in half. If you're a planner reading this: Don't just accept the forecast. Enrich it. Challenge it. Elevate it. That’s how Demand Planning transforms from reactive to strategic.
-
I’m jealous of AI Because with a model you can measure confidence Imagine you could do that as a human? Measure how close or far off you are? here's how to measure for technical and non-technical teams For business teams: Run a ‘known answers’ test. Give the model questions or tasks where you already know the answer. Think of it like a QA test for logic. If it can't pass here, it's not ready to run wild in your stack. Ask for confidence directly. Prompt it: “How sure are you about that answer on a scale of 1-10?” Then: “Why might this be wrong?” You'll surface uncertainty the model won't reveal unless asked. Check consistency. Phrase the same request five different ways. Is it giving stable answers? If not, revisit the product strategy for the llm Force reasoning. Use prompts like “Show step-by-step how you got this result.” This lets you audit the logic, not just the output. Great for strategy, legal, and product decisions. For technical teams: Use the softmax output to get predicted probabilities. Example: Model says “fraud” with 92% probability. Use entropy to spot uncertainty. High entropy = low confidence. (Shannon entropy: −∑p log p) Language models Extract token-level log-likelihoods from the model if you have API or model access. These give you the probability of each word generated. Use sequence likelihood to rank alternate responses. Common in RAG and search-ranking setups. For uncertainty estimates, try: Monte Carlo Dropout: Run the same input multiple times with dropout on. Compare outputs. High variance = low confidence. Ensemble models: Aggregate predictions from several models to smooth confidence. Calibration testing: Use a reliability diagram to check if predicted probabilities match actual outcomes. Use Expected Calibration Error (ECE) as a metric. Good models should show that 80% confident = ~80% correct. How to improve confidence (and make it trustworthy) Label smoothing during training Prevents overconfident predictions and improves generalization. Temperature tuning (post-hoc) Adjusts the softmax sharpness to better align confidence and accuracy. Temperature < 1 → sharper, more confident Temperature > 1 → more cautious, less spiky predictions Fine-tuning on domain-specific data Shrinks uncertainty and reduces hedging in model output. Especially effective for LLMs that need to be assertive in narrow domains (legal, medicine, strategy). Use focal loss for noisy or imbalanced datasets. It down-weights easy examples and forces the model to pay attention to harder cases, which tightens confidence on the edge cases. Reinforcement learning from human feedback (RLHF) Aligns the model's reward with correct and confident reasoning. Bottom line: A confident model isn't just better - it's safer, cheaper, and easier to debug. If you’re building workflows or products that rely on AI, but you’re not measuring model confidence, you’re guessing. #AI #ML #LLM #MachineLearning #AIConfidence #RLHF #ModelCalibration
-
Not all errors are equal. Some are worth fixing more than others. Imagine you’re building a model to predict customer churn. A false negative—predicting a customer will stay when they actually leave—can cost thousands of dollars in lost revenue. A false positive—predicting churn when the customer would have stayed—might only cost a small retention offer. Treating these mistakes as equal, like most accuracy metrics do, misses the bigger picture. This is where 𝐜𝐨𝐬𝐭-𝐬𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐞 𝐦𝐨𝐝𝐞𝐥𝐢𝐧𝐠 comes in. Instead of optimizing for raw accuracy, you can tell your model which mistakes are more costly. In practice, this can be done by: 👉 Weighted loss functions: Modify your training loss to penalize false negatives more than false positives. For example, if using logistic regression or neural networks, you can apply class weights in cross-entropy loss. 👉 Resampling techniques: Oversample the minority “high-cost” class (in this case, churners) or undersample low-cost classes to bias the model towards minimizing high-impact mistakes. Even a well-trained model needs careful 𝐭𝐡𝐫𝐞𝐬𝐡𝐨𝐥𝐝 𝐭𝐮𝐧𝐢𝐧𝐠. The default 0.5 probability isn’t always optimal. You can: 👉 Use business-driven thresholds: Choose the cutoff that maximizes expected revenue or minimizes cost based on your confusion matrix. 👉 Perform grid search or optimization over thresholds using your validation set and the monetary cost associated with each type of prediction. Another way to approach this is through 𝐞𝐱𝐩𝐞𝐜𝐭𝐞𝐝 𝐯𝐚𝐥𝐮𝐞 𝐦𝐨𝐝𝐞𝐥𝐢𝐧𝐠. Assign a real-world cost or gain to each type of prediction, compute the net expected gain over your validation set, and tune the model or threshold to maximize that. This moves the focus from “statistical correctness” to business impact. 𝐔𝐧𝐜𝐞𝐫𝐭𝐚𝐢𝐧𝐭𝐲 also matters. High-confidence predictions are usually reliable, but when the model is unsure—like a probability near 0.5—you can: 👉 Flag these for human review. 👉 Use ensembles or Bayesian models to quantify uncertainty and guide intervention strategies. Finally, don’t forget 𝐦𝐨𝐧𝐢𝐭𝐨𝐫𝐢𝐧𝐠 𝐚𝐟𝐭𝐞𝐫 𝐝𝐞𝐩𝐥𝐨𝐲𝐦𝐞𝐧𝐭. The business environment changes, and so do the costs associated with errors. Regularly recalibrating your thresholds and retraining models ensures you continue focusing on the mistakes that matter most. The key takeaway: chasing perfect accuracy is rarely the goal. By understanding which errors are costly, adjusting your model to focus on them, and incorporating uncertainty into decisions, you build models that not only predict but actually deliver measurable business value.
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development