http://i.imgur.com/EOowdSD.png
[1]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

from matplotlib import pyplot as plt
from lifelines import CoxPHFitter
import numpy as np
import pandas as pd
from lifelines.datasets import load_rossi

plt.style.use('bmh')

Assesing Cox model fit using residuals (work in progress)

This tutorial is on some common use cases of the (many) residuals of the Cox model. We can use resdiuals to diagnose a model’s poor fit to a dataset, and improve an existing model’s fit.

[2]:
df = load_rossi()

df['age_strata'] = pd.cut(df['age'], np.arange(0, 80, 5))
df = df.drop('age', axis=1)

cph = CoxPHFitter()
cph.fit(df, 'week', 'arrest', strata=['age_strata', 'wexp'])
[2]:
<lifelines.CoxPHFitter: fitted with 432 observations, 318 censored>
[3]:
cph.print_summary()
cph.plot();
<lifelines.CoxPHFitter: fitted with 432 observations, 318 censored>
      duration col = 'week'
         event col = 'arrest'
            strata = ['age_strata', 'wexp']
number of subjects = 432
  number of events = 114
    log-likelihood = -434.50
  time fit was run = 2019-01-11 21:46:51 UTC

---
      coef  exp(coef)  se(coef)     z      p  log(p)  lower 0.95  upper 0.95
fin  -0.41       0.67      0.19 -2.10   0.04   -3.34       -0.79       -0.03  .
race  0.29       1.33      0.31  0.93   0.35   -1.04       -0.32        0.90
mar  -0.34       0.71      0.39 -0.87   0.38   -0.96       -1.10        0.42
paro -0.10       0.91      0.20 -0.50   0.62   -0.48       -0.48        0.29
prio  0.08       1.08      0.03  2.83 <0.005   -5.36        0.02        0.14  *
---
Signif. codes: 0 '***' 0.0001 '**' 0.001 '*' 0.01 '.' 0.05 ' ' 1

Concordance = 0.61
Likelihood ratio test = 481.75 on 5 df, log(p)=-232.93
../_images/jupyter_notebooks_Cox_residuals_3_1.png

Martingale residuals

Defined as:

\[\begin{split}\delta_i - \Lambda(T_i) \\ = \delta_i - \beta_0(T_i)\exp(\beta^T x_i)\end{split}\]

where \(T_i\) is the total observation time of subject \(i\) and \(\delta_i\) denotes whether they died under observation of not (event_observed in lifelines).

From [1]:

Martingale residuals take a value between \([1,−\inf]\) for uncensored observations and \([0,−\inf]\) for censored observations. Martingale residuals can be used to assess the true functional form of a particular covariate (Thernau et al. (1990)). It is often useful to overlay a LOESS curve over this plot as they can be noisy in plots with lots of observations. Martingale residuals can also be used to assess outliers in the data set whereby the survivor function predicts an event either too early or too late, however, it’s often better to use the deviance residual for this.

From [2]:

Positive values mean that the patient died sooner than expected (according to the model); negative values mean that the patient lived longer than expected (or were censored).
[4]:
r = cph.compute_residuals(df, 'martingale')
r.head()
[4]:
arrest martingale week
313 True 0.989383 1.0
79 True 0.972812 5.0
60 True 0.947726 6.0
225 True 0.976976 7.0
138 True 0.920272 8.0
[5]:
r.plot.scatter(
    x='week', y='martingale', c=np.where(r['arrest'], '#008fd5', '#fc4f30'),
    alpha=0.75
)
[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1e2851b160>
../_images/jupyter_notebooks_Cox_residuals_6_1.png

Deviance residuals

One problem with martingale residuals is that they are not symetric around 0. Deviance residuals are a transform of martingale residuals them symetric.

  • Roughly symmetric around zero, with approximate standard deviation equal to 1.
  • Positive values mean that the patient died sooner than expected.
  • Negative values mean that the patient lived longer than expected (or were censored).
  • Very large or small values are likely outliers.
[6]:
r = cph.compute_residuals(df, 'deviance')
r.head()
[6]:
arrest week deviance
313 True 1.0 2.666812
79 True 5.0 2.294414
60 True 6.0 2.001769
225 True 7.0 2.364001
138 True 8.0 1.793799
[7]:
r.plot.scatter(
    x='week', y='deviance', c=np.where(r['arrest'], '#008fd5', '#fc4f30'),
    alpha=0.75
)
[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1e28706f28>
../_images/jupyter_notebooks_Cox_residuals_9_1.png
[8]:
r = r.join(df.drop(['week', 'arrest'], axis=1))
[9]:
plt.scatter(r['prio'], r['deviance'], color=np.where(r['arrest'], '#008fd5', '#fc4f30'))
[9]:
<matplotlib.collections.PathCollection at 0x7f1e28785b70>
../_images/jupyter_notebooks_Cox_residuals_11_1.png
[10]:

[10]:
r = cph.compute_residuals(df, 'delta_beta')
r.head()
r = r.join(df[['week', 'arrest']])
r.head()
[10]:
fin race mar paro prio week arrest
313 -0.005651 -0.011591 0.012141 -0.027451 -0.020482 1 1
79 -0.005761 -0.005809 0.007687 -0.020926 -0.013370 5 1
60 -0.005783 -0.000146 0.003277 -0.014325 -0.006314 6 1
225 0.014998 -0.041566 0.004854 -0.002255 -0.015722 7 1
138 0.011573 0.005330 -0.004240 0.013036 0.004404 8 1
[11]:
plt.scatter(r['week'], r['prio'], color=np.where(r['arrest'], '#008fd5', '#fc4f30'))
[11]:
<matplotlib.collections.PathCollection at 0x7f1e286b2fd0>
../_images/jupyter_notebooks_Cox_residuals_14_1.png
[12]:

[1] https://stats.stackexchange.com/questions/297740/what-is-the-difference-between-the-different-residuals-in-survival-analysis-cox

[2] http://myweb.uiowa.edu/pbreheny/7210/f15/notes/11-10.pdf