Posteriors vs predictives
Karl Saliba writes with some queries about our football paper, which I have already discussed on the blog here. He says:
Thanks to the code in the appendix I could easily replicate (by using WinBUGS) a similar analysis on any football league of my choice. I am mostly concerned with the output of Table 2 in the Results section. I understand that each team is being assigned an attack and a defence parameter based on MCMC. I also understand that the higher the attack parameter, the higher the propensity to score, whereas the lower the defending parameter, the less likely it is that this team concedes goals.
As shown in Table 1, the last game of the 1991-1992 season [my note: in the Italian Serie A] was between home team Torino and away team Ascoli which ended up in a 5-2 win for Torino. The posterior parameters for the respective teams are:
Home effect: 0.2124
Torino attack effect: 0.0824
Torino defence effect: -0.4141
Ascoli attack effect: -0.2238
Ascoli defence effect: 0.4776
Based on these parameters, Torino are more likely to score than Ascoli. Secondly, Ascoli are more likely to concede than Torino. Furthermore, Torino are playing at home and so they have that extra boost called the ‘home effect’. Hence, according to my intuition, it would make sense that Torino win this game.
I would like to ask the following 2 questions:
Given any two competing teams as well as all five parameter estimates obtained from MCMC, how is it possible to calculate the probability that the home team wins, the probability that the away team wins, or perhaps the probability that the game ends in a draw?
Based on these parameter estimates, is the following a plausible way of estimating the most likely number of goals that each team will score in an encounter?
Expected Torino goals = exp(0.2124+0.0824+0.4776) = 2.16
Expected Ascoli goals = exp(-0.4141-0.2238) = 0.52
Seems slightly way off. Is there a better way as to evaluate how many goals each team is likely to score?
I think that the main confusion here is in using the posterior distributions for the parameters instead of the relevant predictive distributions for the observable variables. Our model was directly concerned with the estimation of the predictive number of goals scored by each team on each occasion (games) \[ p(y^*_{gj}\mid \mathbf{y}) = \int p(y^*_{gj} \mid \theta) p(\theta \mid \mathbf{y}) d\theta.\] So, for each game \(g\), the main output is a vector from the posterior predictive distribution of the number of goals scored by the team playing at home (\(j=1\)) or away (\(j=2\)). These can be directly used to estimate the probabilities requested by Karl.
For example, in the BUGS model one could define additional nodes, eg
<- step(ynew[g,1]-ynew[g,2])
p.home[g] <- equals(ynew[g,1],ynew[g,2]) p.draw[g]
(in line with the code provided in the appendix of the paper, the node ynew indicates the predictive distribution).
The first one is just an indicator function which takes value 1 if the predicted number of goals for the home team is higher than that of the away team and 0 otherwise. Thus, it quantifies the chance that the home team wins. Similarly, the second is an indicator taking value 1 if the predicted number of goals scored are the same for the two teams and 0 otherwise (and represents the chance of a draw).
The second question is directly related to this point; in this case, the objective is to estimate the predictive distribution. As for the specific example, the observed result is quite extreme (7 goals scored overall). So, given the fact that the hierarchical model produces some shrinkage in the estimates, it is unlikely that this extreme result is picked up “exactly” \(-\) that’s one of the points of the adjustment we propose in section 4 of the paper.
But the general principle stands: the posterior distribution of the model parameters is often the objective of inference, but it doesn’t have to.