I will survive!
Here’s a very long post, to make up for the recent silence on the blog… Lately, I’ve been working on a new project involving the use of survival analysis data and results, specifically for health economic evaluation (cue Cake’s rendition below).
I have to say I’m not really a massive expert on survival analysis, in the sense that it’s never been my main area of interest/research. But I think the particular case of cost-effectiveness modelling is actually very interesting
Anyway, one of the main implications to this is that typically the practitioners are left with the task of fitting a (range of) parametric survival model(s) to their data. Nick Latimer among others have done excellent work in suggesting suitable guidelines. (In fact, both Chris and Nick did come to talk to one of our workshops/seminars, last summer).
Over and above the necessary choice of models, I think there are other interesting issues/challenges for the health economic modeller:
- (Parametric) Survival models are often tricky because there are many different parameterisations, leading to different results presentations. This can be very confusing and without extra care lead to disastrous consequences (because the economic model extrapolates on the wrong survival curves!).
- Even when the parameterisation is taken care of, we are normally interested in characterising the full uncertainty in the joint distribution of the survival model parameters
we need to do this to perform Probabilistic Sensitivity Analysis (PSA), so even in a non-Bayesian model, this is a required output of the analysis. Pragmatically, this means computing a survival curve for a large combination of parameter values and feeding each to the economic model to assess the impact of uncertainty on the final decision-making process. - Much to my frustration (and, I realise, to the frustration of the people I keep nagging about this!), the economic models are (too) often performed in Excel. This means that while the survival analysis is done externally in a proper statistical software, then the results (usually in tabular form) are copied over the spreadsheet and used to then construct the survival curves (eg via VBA macros).
I think the process is complex enough and after talking to some health economist colleagues, I’ve started to work on a R package to try and standardise and simplify it as much as possible. In fact, this will be more of a wrapper for several other R packages, but I’m planning on including some nice (I would say that, wouldn’t I?) features.
Firstly, the idea is to allow the user to fit survival models using MLE as well as a Bayesian approach. In the former case (which I will reluctantly set as the default
For the Bayesian analysis, I’m allowing the user to either use the (not-so-wide) range of in-built models in INLA, or to go fully MCMC and use OpenBUGS (which offers the same range as flexsurv). Of course, one of the crucial features of going fully Bayesian on this is that the posterior joint distribution of the model parameters can be fully characterised (using the new inla.posterior.sample function in INLA, or directly from the MCMC simulations in OpenBUGS) and so the uncertainty in the survival curves can be propagated directly in the economic model.
In terms of running time, INLA is basically just as fast as the MLE, while MCMC is generally slower (and, it is known to possibly run into convergence problems for some parameterisations).
The typical call to survHE will be something like this
<- fit.models(formula, data, distr, method, ...) fit
where
formula
can be specified using standard R notation, something likeSurv(time,event)~as.factor(arm)
, for MLE analysis orinla.surv(time,event)~as.factor(arm)
, for INLA. I think I’ve managed to make the function clever enough to recognise which formula should be used depending on what method of inference is specified and also to figure out how to translate this into BUGS language.data
is, shockingly, the dataset to be used.distr
is a (vector) of string(s) indicating which parametric distribution(s) should be fitted to the data, something likedistr=c("exponential","weibull")
. Again, to make the modellers’ lives easier, I’ve kind of made mine miserable and tried to be very clever in accounting for differences in terminology across the three packages/approaches I cater for.method
is a string specifying what kind of analysis should be done, with values at"mle"
,"inla"
or"mcmc"
.
The other options are mainly to do with INLA & BUGS
Alongside the main fit.models function, I have (or I am) prepared (preparing) a set of other functions for plotting, printing and, most importantly, exporting the results (eg of the PSA) for instance to an Excel file. The idea is that in this way, all the funny business of doing part of the survival analysis in a statistical software and then completing the computation of the survival curves for different values of the parameters in Excel can be avoided
I’ll post more on this when I’m closer to a beta-relase