Update: Predicting Leeds Utd's Progress Through the 2025/26 EPL Season
Back in July I wrote about how Particle Filters can be used to provide a probabilistic estimate of the points Leeds Utd could accumulate through the new season. Based purely on bookmakers odds, the model predicted a modal outcome of 32 points, significantly below the widely considered safe total of 40 points if relegation is to be avoided.
Well, we're 17 games into the season (almost half-way), and so it's a good time to re-run the model to see if Leeds' fortunes have improved. To make it interesting, I've modified the model so that the form of the team in past matches contributes to future outcomes, rather than relying solely on bookmakers odds. The key idea is to introduce a latent variable into the model. A latent variable is not directly observable; it represents intangible qualities like team quality, current form or managerial effectiveness. So, instead of pretending probabilities are fixed, we say, "There is an underlying hidden state (the latent variable) that generates match results."
My original model assumed each match that Leeds played has a fixed probability (pW, pD, pL). In the original version, past results only add points, they don't change future probabilities. In the new model, we introduce a latent variable representing Leeds' "strength" or "form."
How Latent Strength Affects Match Odds
The strength, s, of the Leeds team is a continuous variable where when s = 0, Leeds is exactly as good as preseason expectations. If s > 0 Leeds is stronger than expected, and for s < 0, the team is weaker than expected.
We think of the preseason odds (pW, pD, pL) as the prior belief about a match outcome. The latent strength nudges these beliefs with positive values pushing the probabilities from loss to win, and negative values of strength pushing probabilities from win towards loss.
Without going into detail, mathematically, we manipulate the bookmakers odds in log-odds or logit space. We manipulate odds in log-odds space, then apply a softmax transform, which ensures the final probabilities are valid and sum to 1. This mirrors how bookmakers adjust odds.
How the Model Learns from Results
When a match is played:
This is effectively Bayesian updating. After 17 matches played, we now have a posterior distribution over s that feeds into all remaining matches.
Recommended by LinkedIn
So when we simulate future fixtures each particle represents a plausible Leeds performance, where strong particles win more often, weak particles lose more often, and our baseline odds anchor future match difficulty.
So What's Leeds Utd's Predicted Points Total at the End of the Season?
The image at the top of the article shows the predicted percentile bands showing how the distribution of points evolves over the season. The lines are quantiles of the distribution of simulated seasons. At each matchday, half the simulated seasons lie above the green line and half below it.
The most frequent points total (the mode) provides a useful estimate of the final total in this case. The model predicts a modal value of 40 points! This is good news, and is driven by recent improvements in form interacting with the baseline difficulty of the remaining fixtures.
At present the remaining fixtures still use the preseason bookmaker probabilities as the baseline; updating these to current market odds will further refine the forecast.
If this current run of form continues, then Leeds are on target to reach the "safe" total of 40 points! Time will tell, of course.
Hope you found this interesting. I'll provide an update at the end of the season, to see how close we got to the model prediction.
Until then Merry Christmas and MOT!
As long as they don't get 3 points today. 😉 😆
And people still doubt the real-world impact of AI....