Snap

Bayesian statistics

Miscellanea

Author

Written by Gianluca

Published

April 25, 2017

In the grand tradition of all recent election times, I’ve decided to have a go and try and build a model that could predict the results of the upcoming snap general election in the UK. I’m sure there will be many more people having a go at this, from various perspectives and using different modelling approaches. Also, I will try very hard to _not _spend all of my time on this and so I have set out to develop a fairly simple (although, hopefully reasonable) model.

First off: the data. I think that since the announcement of the election, the pollsters have intensified the number of surveys; I have found already 5 national polls (two by Yougov, two by ICM and one by Opinium $-$ there may be more and I’m not claiming a systematic review/meta-analysis of the polls.

Arguably, this election will be mostly about Brexit: there surely will be other factors, but because this comes almost exactly a year after the referendum, it is a fair bet to suggest that how people felt and still feel about its outcome will also massively influence the election. Luckily, all the polls I have found do report data in terms of voting intention, broken up by Remain/Leave. So, I’m considering $P = 8$ main political parties: Conservatives, Labour, UKIP, Liberal Democrats, SNP, Green, Plaid Cymru and “Others”. Also, for simplicity, I’m considering only England, Scotland and Wales $-$ this shouldn’t be a big problem, though, as in Northern Ireland elections are generally a “local affair”, with the mainstream parties not playing a significant role.

I also have available data on the results of both the 2015 election (by constituency and again, I’m only considering the $C = 632$ constituencies in England, Scotland and Wales $-$ this leaves out the 18 Northern Irish constituencies) and the 2016 EU referendum. I had to do some work to align these two datasets, as the referendum did not consider the usual geographical resolution. I have mapped the voting areas used 2016 to the constituencies and have recorded the proportion of votes won by the $P$ parties in 2015, as well as the proportion of Remain vote in 2016.

For each observed poll $i = 1, \dots, N_{p o l l s}$ , I modelled the observed data among “ $L$ eavers” as $y_{i 1}^{L}, \dots, y_{i P}^{L} \sim Multinomial ((π_{1}^{L}, \dots, π_{P}^{L}), n_{i}^{L}) .$ Similarly, the data observed for ” $R$ emainers” are modelled as $y_{i 1}^{R}, \dots, y_{i P}^{R} \sim Multinomial ((π_{1}^{R}, \dots, π_{P}^{R}), n_{i}^{R}) .$ In other words, I’m assuming that within the two groups of voters, there is a vector of underlying probabilities associated with each party ( $π_{p}^{L}$ and $π_{p}^{R}$ ) that are pooled across the polls. $n_{i}^{L}$ and $n_{i}^{R}$ are the sample sizes of each poll for $L$ and $R$ .

I used a fairly standard formulation and modelled $π_{p}^{L} = \frac{ϕ_{p}^{L}}{\sum_{p = 1}^{P} ϕ_{p}^{L}} and π_{p}^{R} = \frac{ϕ_{p}^{R}}{\sum_{p = 1}^{P} ϕ_{p}^{R}}$ and then $\log ϕ_{p}^{j} = α_{p} + β_{p} j$ with $j = 0, 1$ to indicate $L$ and $R$ , respectively. Again, using fairly standard modelling, I fix $α_{1} = β_{1} = 0$ to ensure identifiability and then model $α_{2}, \dots, α_{P} \sim Normal (0, σ_{α})$ and $β_{2}, \dots, β_{P} \sim Normal (0, σ_{β})$ .

This essentially fixes the “Tory effect” to 0 (if only I could really do that!…) and then models the effect of the other parties with respect to the baseline. Negative values for $α_{p}$ indicate that party $p \neq 1$ is less likely to grab votes among leavers than the Tories; similarly positive values for $β_{p}$ mean that party $p \neq 1$ is more popular than the Tories among remainers. In particular, I have used some informative priors by defining the standard deviations $σ_{α} = σ_{β} = \log (1.5)$ , to mean that it is unlikely to observe massive deviations (remember that $α_{p}$ and $β_{p}$ are defined on the log scale).

I then use the estimated party- and EU result-specific probabilities to compute a “relative risk” with respect to the observed overall vote at the 2015 election $ρ_{p}^{j} = \frac{π_{p}^{j}}{π_{p}^{15}},$ which essentially estimates how much better (or worse) the parties are doing in comparison to the last election, among leavers and remainers. The reason I want these relative risks is because I can then distribute the information from the current polls and the EU referendum to each constituency $c = 1, \dots, C$ by estimating the predicted share of votes at the next election as the mixture $π_{c p}^{17} = (1 - γ_{c}) π_{p}^{15} ρ_{p}^{L} + γ_{c} π_{p}^{15} ρ_{p}^{R},$ where $γ_{c}$ is the observed proportion of remain voters in constituency $c$ .

Finally, I can simulate the next election by ensuring that in each constituency the $^{17}_{cp} $ sum to 1. I do this by drawing the vote shares as ${\hat{π}}_{c p}^{17} \sim Dirichlet (π_{1}^{17}, \dots, π_{P}^{17})$ .

In the end, for each constituency I have a distribution of election results, which I can use to determine the average outcome, as well as various measures of uncertainty. So in a nutshell, this model is all about i) re-proportioning the 2015 and 2017 votes based on the polls; and ii) propagating uncertainty in the various inputs.

I’ll update this model as more polls become available $-$ one extra issue then will be about discounting older polls (something like what Roberto did here and here, but I think I’ll keep things easy for this). For now, I’ve run my model for the 5 polls I mentioned earlier and this is the (rather depressing) result.

From the current data and the modelling assumption, this looks like the Tories are indeed on course for a landslide victory $-$ my results are also kind of in line with other predictions (eg here). The model here may be flattering to the Lib Dems $-$ the polls seem to indicate almost unanimously that they will be doing very well in areas of a strong Remain persuasion, which means that the model predicts they will gain many seats, particularly where the 2015 election was won with a little margin (and often they leapfrog Labour to the first place).

The following table shows the predicted “swings” $-$ who’s stealing votes from whom:

	Conservative	Green	Labour	Lib Dem	PCY	SNP
Conservative	325	0	0	5	0	0
Green	0	1	0	0	0	0
Labour	64	0	160	6	1	1
Liberal Democrat	0	0	0	9	0	0
Plaid Cymru	0	0	0	0	3	0
Scottish National Party	1	0	0	5	0	50
UKIP	1	0	0	0	0	0

Again, at the moment, bad day at the office for Labour who fails to win a single new seat, while losing over 60 to the Tories, 6 to the Lib Dems, 1 to Plaid Cymru in Wales and 1 to the SNP (which would mean Labour completely erased from Scotland). UKIP is also predicted to lose their only seat $-$ but again, this seems a likely outcome.