This vignette is still ongoing work, so if you are looking for something you cannot find, please alert me and I will do something about it.

1 Background

Despite stated otherwise by some, the Gompertz distribution can be parameterized both as a PH model and as an AFT one.

The Gompertz families of distributions are defined in essentially two ways in the R package eha: The rate and the canonical representations. The reason for this is that the families need to be differently represented depending on whether proportional hazards or accelerated failure time models are under consideration.

In the proportional hazards case, the rate formulation is used, and it is characterized by an exponentially increasing hazard function with fixed rate r:

\[\begin{equation} h(t; p, r) = p e^{r t}, \quad p, t > 0; -\infty < r < \infty. \end{equation}\]

As noted earlier, when \(r < 0\), the hazard function \(h\) is decreasing “too fast” to define a proper survival function, and \(r = 0\) gives the exponential distribution as a special case. And for each fixed \(r\), the family of distributions indexed by \(p > 0\) constitutes a proportional hazards family of distributions, and the corresponding regression model is written as

\[\begin{equation} h(t; x, p, r, \beta) = p e^{r t} e^{\beta x}, \quad t > 0. \end{equation}\]

2 Proportional hazards model

In Figure 2.1 a Gompertz fit is compared to ordinary Cox regression.

fit.c <- coxreg(Surv(enter - 60, exit - 60, event) ~ sex + region,
                data = oldmort)
fit.g <- phreg(Surv(enter - 60, exit - 60, event) ~ sex + region, 
             dist = "gompertz", param = "rate", data = oldmort)
plot(fit.c, xlab = "Years above age 60.")
haz.g <- hazards(fit.g, cum = TRUE)
lines(haz.g$x, haz.g$y, lty = 2)
legend("topleft", legend = c("Cox regression", "Gompertz regression"), lty = 1:2)
Baseline cumulative hazards for Cox and Gompertz regressions.

Figure 2.1: Baseline cumulative hazards for Cox and Gompertz regressions.

The Gompertz model fits the baseline hazard very well up until duration 30 (age 90), but after that the exponential growth slows down.

The result of fitting the Gompertz model is shown here,

summary(fit.g)
## Covariate             Mean       Coef     Rel.Risk   S.E.    LR p
## sex                                                          0.000 
##             male      0.406     0         1 (reference)
##           female      0.594    -0.190     0.827     0.046
## region                                                       0.002 
##             town      0.111     0         1 (reference)
##         industry      0.326     0.207     1.230     0.085
##            rural      0.563     0.047     1.048     0.083
## 
## Events                    1971 
## Total time at risk         37824 
## Max. log. likelihood      -7280.9 
## LR test statistic         31.18 
## Degrees of freedom        3 
## Overall p-value           7.79564e-07

to be compared to the Cox regression results.

summary(fit.c)
## Covariate             Mean       Coef     Rel.Risk   S.E.    LR p
## sex                                                          0.000 
##             male      0.406     0         1 (reference)
##           female      0.594    -0.187     0.830     0.046
## region                                                       0.001 
##             town      0.111     0         1 (reference)
##         industry      0.326     0.212     1.236     0.085
##            rural      0.563     0.052     1.053     0.083
## 
## Events                    1971 
## Total time at risk         37824 
## Max. log. likelihood      -13563 
## LR test statistic         30.96 
## Degrees of freedom        3 
## Overall p-value           8.68365e-07

3 Accelerated failure time model

The Gompertz distribution is special in that it can be fit into both the AFT and the PH framework. Of course, as we have seen, this also holds for the Weibull distribution in a trivial way, the AFT and the PH models are the same, but for the Gompertz distribution, the AFT and PH approaches yield different models.

For the AFT framework to be in place in the Gompertz case, it needs to be formulated with a rather unfamiliar parametrization, which is called the canonical parametrization in the package eha. It works as follows. The standard definition of the Gompertz hazard function is

\[\begin{equation*} h_r(t; (\alpha, \beta)) = \alpha \exp(\beta t), \quad t > 0; \; \alpha > 0, -\infty < \sigma < \infty. \end{equation*}\]

and it is called the rate parametrization in eha As noted earlier, in order for \(h_G\) to determine a proper survival distribution, it must be required that \(\beta \ge 0\). It was also noted that when \(\beta = 0\), the resulting distribution is exponential.

The canonical definition of the Gompertz hazard function is given by

\[\begin{equation*} h_c(t; (\tau, \sigma)) = \frac{\tau}{\sigma} \exp\biggl(\frac{t}{\sigma}\biggr), \quad t > 0; \; \tau, \sigma > 0. \end{equation*}\]

The transition from \(h_r\) to \(h_c\) is given by \(\sigma = 1 / \beta, \, \tau = \alpha / \beta\), and note that this implies that the rate in the canonical form must be strictly positive. Furthermore, the exponential special case now appears in the limit as \(\sigma \rightarrow \infty\). In practice this means that when the baseline hazard is only weakly increasing, \(\sigma\) is very large and numerical problems in the estimation process is likely to occur.

The conclusion of all this is that the AFT Gompertz model is suitable in situations where the intensity of an event is clearly increasing with time. A good example is adult mortality.

We repeat the PH analysis but with the AFT model.

fit.gaft <- aftreg(Surv(enter - 60, exit - 60, event) ~ sex + region, 
                   id = id, param = "lifeExp", dist = "gompertz", 
                   data = oldmort)
summary(fit.gaft)
## Covariate            W.mean      Coef Life-Expn  se(Coef)    LR p
## sex                                                          0.001 
##             male      0.406     0         1   (reference)
##           female      0.594     0.065     1.067     0.019
## region                                                       0.007 
##             town      0.111     0         1   (reference)
##         industry      0.326    -0.096     0.908     0.039
##            rural      0.563    -0.046     0.955     0.039
## 
## Events                    1971 
## Total time at risk         37824 
## Max. log. likelihood      -7288.3 
## LR test statistic         21.87 
## Degrees of freedom        3 
## Overall p-value           6.92831e-05