Cox Proportional Hazards Model

Introduction

Cox proportional hazards regression is a semiparametric method for adjusting survival rate estimates to quantify the effect of predictor variables. The method represents the effects of explanatory variables as a multiplier of a common baseline hazard function, h₀(t). The hazard function is the nonparametric part of the Cox proportional hazards regression function, whereas the impact of the predictor variables is a loglinear regression. For a baseline relative to 0, this model corresponds to

$h (X_{i}, t) = h_{0} (t) \exp [\sum_{j = 1}^{p} x_{i j} b_{j}],$

where $X_{i} = (x_{i 1}, x_{i 2}, \dots, x_{i p})$ is the predictor variable for the ith subject, h(X_i,t) is the hazard rate at time t for X_i, and h₀(t) is the baseline hazard rate function.

To fit a Cox proportional hazards model to data, use either the coxphfit function or the fitcox function. The fitcox function is more modern, and returns a CoxModel object containing detailed information about the model. For an example using fitcox, see Cox Proportional Hazards Model Object.

Hazard Ratio

The Cox proportional hazards model relates the hazard rate for individuals or items at the value X_i, to the hazard rate for individuals or items at the baseline value. It produces an estimate for the hazard ratio:

$H R (X_{i}) = \frac{h (X_{i}, t)}{h_{0} (t)} = \exp [\sum_{j = 1}^{p} x_{i j} b_{j}] .$

The model is based on the assumption that the baseline hazard function depends on time, t, but the predictor variables do not. This assumption is also called the proportional hazards assumption, which states that the hazard ratio does not change over time for any individual.

The hazard ratio represents the relative risk of instant failure for individuals or items having the predictive variable value X_i compared to the ones having the baseline values. For example, if the predictive variable is smoking status, where nonsmoking is the baseline category, the hazard ratio shows the relative instant failure rate of smokers compared to the baseline category, that is, nonsmokers. For a baseline relative to X^* and the predictor variable value X_i, the hazard ratio is

$H R (X_{i}) = \frac{h (X_{i}, t)}{h (X^{*}, t)} = \exp [\sum_{j = 1}^{p} (x_{i j} - x_{j}^{*}) b_{j}] .$

For example, if the baseline is the mean values of the predictor variables (mean(X)), then the hazard ratio becomes

$H R (X_{i}) = \frac{h (X_{i}, t)}{h (\bar{X}, t)} = \exp [\sum_{j = 1}^{p} (x_{i j} - {\bar{x}}_{j}) b_{j}] .$

Hazard rates are related to survival rates, such that the survival rate at time t for an individual with the explanatory variable value X_i is

$S_{X_{i}} (t) = S_{0} {(t)}^{H R (X_{i})},$

where S₀(t) is the survivor function with the baseline hazard rate function h₀(t), and HR(X_i) is the hazard ratio of the predictor variable value X_i relative to the baseline value.

Extension of Cox Proportional Hazards Model

When you have variables that do not satisfy the proportional hazards (PH) assumption, you can consider using two extensions of Cox proportional hazards model: the stratified Cox model and the Cox model with time-dependent variables.

If the variables that do not satisfy the PH assumption are categorizable, use the stratified Cox model:

$h_{s} (X_{i}, t) = h_{0 s} (t) \exp [\sum_{j = 1}^{p} x_{i j} b_{j}],$

where the subscript s indicates the sth stratum. The stratified Cox model has a different baseline hazard rate function for each stratum but shares coefficients. Therefore, it has the same hazard ratio across all strata if the predictor variable values are the same. You can include stratification variables in coxphfit by using the name-value pair 'Strata'. For an example using a stratified Cox model with a Cox model object, see Cox Proportional Hazards Model Object.

If the variables that do not satisfy the PH assumption are time-dependent variables, use the Cox model with time-dependent variables:

$h (X_{i}, t) = h_{0} (t) \exp [\sum_{j = 1}^{p_{1}} x_{i j} b_{j} + \sum_{k = 1}^{p_{2}} x_{i k} (t) c_{k}],$

where x_ij is an element of a time-independent predictor and x_ik(t) is an element of a time-dependent predictor. For an example of how to include time-dependent variables in coxphfit, see Cox Proportional Hazards Model with Time-Dependent Covariates.

Partial Likelihood Function

A point estimate of the effect of each explanatory variable, that is, the estimated hazard ratio for the effect of each explanatory variable is exp(b), given all other variables are held constant, where b is the coefficient estimate for that variable. The coefficient estimates are found by maximizing the partial likelihood function of the model. The partial likelihood function for the proportional hazards regression model is based on the observed order of events. It is the product of partial likelihoods of failures estimated for each failure time. If there are n failures at n distinct failure times, $t_{1} < t_{2} < \dots < t_{n}$ , then the partial likelihood is

$L = \frac{H R (X_{1})}{\sum_{j = 1}^{n} H R (X_{j})} \times \frac{H R (X_{2})}{\sum_{j = 2}^{n} H R (X_{j})} \times \cdot \cdot \cdot \times \frac{H R (X_{n})}{H R (X_{n})} = \prod_{i = 1}^{n} \frac{H R (X_{i})}{\sum_{j = i}^{n} H R (X_{j})} .$

You can rewrite the partial likelihood by using a risk set R_i:

$L = \prod_{i = 1}^{n} \frac{H R (X_{i})}{\sum_{j \in R_{i}} H R (X_{j})},$

where R_i represents the index set of subjects who are under study but do not experience the event until the ith failure time.

You can use a likelihood ratio test to assess the significance of adding a term or terms in a model. Consider the two models where the first model has p predictive variables and the second model has p + r predictive variables. Then, comparing the two models, –2*(L₁/L₂) has a chi-square distribution with r degrees of freedom (the number of terms being tested).

Partial Likelihood Function for Tied Events

When you have tied events, coxphfit approximates the partial likelihood of the model by either Breslow’s method (default) or Efron’s method, instead of computing the exact partial likelihood. Computing the exact partial likelihood requires a large amount of computation, which involves an entire permutation of the risk sets for the tied event times.

The simplest approximation method is Breslow’s method. This method uses the same denominator for each tied set.

$L = \prod_{i = 1}^{d} \prod_{j \in D_{i}} \frac{H R (X_{j})}{\sum_{k \in R_{i}} H R (X_{k})},$

where d is the number of distinct event times, and D_i is the index set of all subjects whose event time is equal to the ith event time.

Efron’s method is more accurate than Breslow’s method, yet simple. This method adjusts the denominator of the tied events as follows:

$L = \prod_{i = 1}^{d} \prod_{j \in D_{i}} \frac{H R (X_{j})}{\sum_{k \in R_{i}} H R (X_{k}) - \frac{j - 1}{d_{i}} \sum_{k \in D_{i}} H R (X_{k})},$

where d_i is the number of indexes in D_i.

For an example, assume that the first two events are tied, that is, t₁ = t₂ and $t_{2} < t_{3} < \dots < t_{n}$ . In Breslow’s method, the denominators of the first two terms are the same:

$L = \frac{H R (X_{1})}{\sum_{j = 1}^{n} H R (X_{j})} \times \frac{H R (X_{2})}{\sum_{j = 1}^{n} H R (X_{j})} \times \frac{H R (X_{3})}{\sum_{j = 3}^{n} H R (X_{j})} \times \frac{H R (X_{4})}{\sum_{j = 4}^{n} H R (X_{j})} \times \cdot \cdot \cdot \times \frac{H R (X_{n})}{H R (X_{n})} .$

Efron’s method adjusts the denominator of the second term:

$L = \frac{H R (X_{1})}{\sum_{j = 1}^{n} H R (X_{j})} \times \frac{H R (X_{2})}{0.5 H R (X_{1}) + 0.5 H R (X_{2}) + \sum_{j = 3}^{n} H R (X_{j})} \times \frac{H R (X_{3})}{\sum_{j = 3}^{n} H R (X_{j})} \times \frac{H R (X_{4})}{\sum_{j = 4}^{n} H R (X_{j})} \times \cdot \cdot \cdot \times \frac{H R (X_{n}, t_{n})}{H R (X_{n}, t_{n})} .$

You can specify an approximation method by using the name-value pair 'Ties' in coxphfit.

Frequency or Weights of Observations

The Cox proportional hazards model can incorporate with the frequency or weights of observations. Let w_i be the weight of the ith observation. Then, the partial likelihoods of the Cox model with weights become as follows:

Partial likelihood with weights

$L = \prod_{i = 1}^{n} \frac{H R_{w} (X_{i})}{\sum_{j \in R_{i}} w_{j} H R (X_{j})},$
where

$H R_{w} (X_{i}) = \exp [\sum_{j = 1}^{p} w_{j} x_{i j} b_{j}] .$
Partial likelihood with weights and Breslow’s method

$L = \prod_{i = 1}^{d} \prod_{j \in D_{i}} \frac{H R_{w} (X_{j})}{{[\sum_{k \in R_{i}} w_{k} H R (X_{k})]}^{\frac{1}{d_{i}} \sum_{j \in D_{i}} w_{j}}}$
Partial likelihood with weights and Efron’s method

$L = \prod_{i = 1}^{d} \prod_{j \in D_{i}} \frac{H R_{w} (X_{j})}{{[\sum_{k \in R_{i}} w_{k} H R (X_{k}) - \frac{j - 1}{d_{i}} \sum_{k \in D_{i}} w_{k} H R (X_{k})]}^{\frac{1}{d_{i}} \sum_{j \in D_{i}} w_{j}}}$

You can specify the frequency or weights of observations by using the name-value pair 'Frequency' in coxphfit.

References

[1] Cox, D. R., and D. Oakes. Analysis of Survival Data. London: Chapman & Hall, 1984.

[2] Lawless, J. F. Statistical Models and Methods for Lifetime Data. Hoboken, NJ: Wiley-Interscience, 2002.

[3] Kleinbaum, D. G., and M. Klein. Survival Analysis. Statistics for Biology and Health. 2nd edition. Springer, 2005.

[4] Klein, J. P., and M. L. Moeschberger. Survival Analysis. Statistics for Biology and Health. 2nd edition. Springer, 2003.