"Football"... I mean "soccer"... I mean "football"...

A couple of weeks ago, I was contacted by Daniel Weitzenfeld $-$ a Chicago freelance data scientist (his own definition). Daniel got interested in modelling sports results and googled our football paper $-$ in his post here, he jokes that, because we’re Italians, in our paper “football” means “soccer”. But of course, I would respond saying that the real story is that when he says “football” he means “the weird version of rugby that Americans play”…

Anyway, he set out to adapt our model to last year’s Premier League data, using pymc. In fact, he’s slightly modified our model $-$ we exchanged a couple of emails to clarify some issues from our original model (he did make a couple of good points). He then discusses the issue of shrinkage in the results of the model $-$ as he says (quoting John Kruschke) shrinkage is not necessarily good or bad and it’s just a feature of how the data are modelled.

In our case, however, model fit was massively improved by using a more complex specification that (including some prior knowledge about the potential strength of the teams) would reduce the amount of shrinkage $-$ in effect, we had assumed three different data generating processes (or some form of conditional exchangeability); one for “good” teams (fighting for the title), one for “average” teams and one for the “poor” teams (struggling for relegation).

I was quite interested in the pymc modelling $-$ I’ll have to have a closer look at some point…