The problem is to get some idea of how long it takes to get pregnant and, if I understand correctly, what Richie has done is to base his model on some point estimate of the _monthly fecundity rates _(MFR). As is reasonable, he has taken them to vary by women’s age.
Then, he used a negative binomial model for the probability of getting pregnant after trying for a given number of months. As far as I can see, he produces an estimation of the probability of pregnancy after $0,1,\ldots,60$ months for different age groups. These estimations are based on the negative binomial model, for which, as he realises, when a positive probability $p$ is assumed, eventually (if a long enough “follow up” is considered) will tend to 1. Here’s his result.
I like the graph and the underlying R code, but I think that there are several limitations to this model (NB I think he’s done this in the “spirit” of blogging, ie something like a toy example to show how a real-life problem can be discussed in statistical or mathematical terms. Thus I’m not criticising him $-$ just thinking about it with my Bayesian statistician hat on).
First, from what I can gather (Disclaimer: because I should go packing instead of being on the blog, I have not read all the comments to his original post, so some of these points may have already been picked up), the curves above consider a fixed MFR for each “cohort”. In other words, all the points in red are computed using a negative binomial model where the parameter $r$ (the months before a pregnancy occurs) varies from 0 to 60 months, but the parameter $p$ (representing the underlying MFR) doesn’t. This is not correct, I think, because albeit perhaps minimally, the MFR should vary with every year of age. Thus, a women aged 26 probably has a (slightly) lower rate than one of 25. This may or may not be negligible.
Moreover, as pointed out in Richie’s blog by some other people and Richie himself, two need to play this game, so only considering the female MFR is just one side of the story and does not account for interaction and possible problems on either sides.
From my point of view, the strongest limitation of the model is that point estimates are considered, with no account of uncertainty whatsoever. As we saw in the telomere paper, there is a very large variability in the success rates, even within age groups. Thus, for a woman aged 25 the rate may vary between .05 and .4 (I’m making these numbers up, but I think the range could be reasonable). This will obviously have implications in the overall estimation of the success probability.
Off the top of my head, informative mixture-priors could be applied to account for underlying sub-populations within the overall population of couples (for example the case of a woman with one blocked fallopian tube, or similar cases).
Nice post, though $-$ the kind of things we really like!