CoxTimeVaryingFitter¶

class
lifelines.fitters.cox_time_varying_fitter.
CoxTimeVaryingFitter
(alpha=0.05, penalizer=0.0, l1_ratio: float = 0.0, strata=None)¶ Bases:
lifelines.fitters.SemiParametricRegressionFitter
,lifelines.fitters.mixins.ProportionalHazardMixin
This class implements fitting Cox’s timevarying proportional hazard model:
\[h(tx(t)) = h_0(t)\exp((x(t)\overline{x})'\beta)\]Parameters:  alpha (float, optional (default=0.05)) – the level in the confidence intervals.
 penalizer (float, optional) – the coefficient of an L2 penalizer in the regression

params_
¶ The estimated coefficients. Changed in version 0.22.0: use to be
.hazards_
Type: Series

hazard_ratios_
¶ The exp(coefficients)
Type: Series

confidence_intervals_
¶ The lower and upper confidence intervals for the hazard coefficients
Type: DataFrame

event_observed
¶ The event_observed variable provided
Type: Series

weights
¶ The event_observed variable provided
Type: Series

variance_matrix_
¶ The variance matrix of the coefficients
Type: DataFrame

strata
¶ the strata provided
Type: list

standard_errors_
¶ the standard errors of the estimates
Type: Series

baseline_cumulative_hazard_
¶ Type: DataFrame

baseline_survival_
¶ Type: DataFrame

AIC_partial_
¶ “partial” because the loglikelihood is partial

check_assumptions
(training_df: pandas.core.frame.DataFrame, advice: bool = True, show_plots: bool = False, p_value_threshold: float = 0.01, plot_n_bootstraps: int = 10, columns: Optional[List[str]] = None) → None¶ Use this function to test the proportional hazards assumption. See usage example at https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html
Parameters:  training_df (DataFrame) – the original DataFrame used in the call to
fit(...)
or a subsampled version.  advice (bool, optional) – display advice as output to the user’s screen
 show_plots (bool, optional) – display plots of the scaled schoenfeld residuals and loess curves. This is an eyeball test for violations. This will slow down the function significantly.
 p_value_threshold (float, optional) – the threshold to use to alert the user of violations. See note below.
 plot_n_bootstraps – in the plots displayed, also display plot_n_bootstraps bootstrapped loess curves. This will slow down the function significantly.
 columns (list, optional) – specify a subset of columns to test.
Examples
from lifelines.datasets import load_rossi from lifelines import CoxPHFitter rossi = load_rossi() cph = CoxPHFitter().fit(rossi, 'week', 'arrest') cph.check_assumptions(rossi)
Notes
The
p_value_threshold
is arbitrarily set at 0.01. Under the null, some covariates will be below the threshold (i.e. by chance). This is compounded when there are many covariates.Similarly, when there are lots of observations, even minor deviances from the proportional hazard assumption will be flagged.
With that in mind, it’s best to use a combination of statistical tests and eyeball tests to determine the most serious violations.
References
section 5 in https://socialsciences.mcmaster.ca/jfox/Books/Companion/appendices/AppendixCoxRegression.pdf, http://www.mwsug.org/proceedings/2006/stats/MWSUG2006SD08.pdf, http://eprints.lse.ac.uk/84988/1/06_ParkHendry2015ReassessingSchoenfeldTests_Final.pdf
 training_df (DataFrame) – the original DataFrame used in the call to

compute_followup_hazard_ratios
(training_df: pandas.core.frame.DataFrame, followup_times: Iterable[T_co]) → pandas.core.frame.DataFrame¶ Recompute the hazard ratio at different followup times (lifelines handles accounting for updated censoring and updated durations). This is useful because we need to remember that the hazard ratio is actually a weightedaverage of periodspecific hazard ratios.
Parameters:  training_df (pd.DataFrame) – The same dataframe used to train the model
 followup_times (Iterable) – a list/array of followup times to recompute the hazard ratio at.

compute_residuals
(training_dataframe: pandas.core.frame.DataFrame, kind: str) → pandas.core.frame.DataFrame¶ Compute the residuals the model.
Parameters:  training_dataframe (DataFrame) – the same training DataFrame given in fit
 kind (string) – One of {‘schoenfeld’, ‘score’, ‘delta_beta’, ‘deviance’, ‘martingale’, ‘scaled_schoenfeld’}
Notes
'scaled_schoenfeld'
: lifelines does not add the coefficients to the final results, but R does when you callresiduals(c, "scaledsch")

fit
(df, event_col, start_col='start', stop_col='stop', weights_col=None, id_col=None, show_progress=False, step_size=None, robust=False, strata=None, initial_point=None, formula: str = None)¶ Fit the Cox Proportional Hazard model to a time varying dataset. Tied survival times are handled using Efron’s tiemethod.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col, plus other covariates. duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 event_col (string) – the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are noncensored.
 start_col (string) – the column that contains the start of a subject’s time period.
 stop_col (string) – the column that contains the end of a subject’s time period.
 weights_col (string, optional) – the column that contains (possibly timevarying) weight of each subjectperiod row.
 id_col (string, optional) – A subject could have multiple rows in the DataFrame. This column contains the unique identifier per subject. If not provided, it’s up to the user to make sure that there are no violations.
 show_progress (since the fitter is iterative, show convergence) – diagnostics.
 robust (bool, optional (default: True)) – Compute the robust errors using the Huber sandwich estimator, aka WeiLin estimate. This does not handle ties, so if there are high number of ties, results may significantly differ. See “The Robust Inference for the Cox Proportional Hazards Model”, Journal of the American Statistical Association, Vol. 84, No. 408 (Dec., 1989), pp. 1074 1078
 step_size (float, optional) – set an initial step size for the fitting algorithm.
 strata (list or string, optional) – specify a column or list of columns n to use in stratification. This is useful if a categorical covariate does not obey the proportional hazard assumption. This is used similar to the strata expression in R. See http://courses.washington.edu/b515/l17.pdf.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 formula (str, optional) – A Rlike formula for transforming the covariates
Returns: self – self, with additional properties like
hazards_
andprint_summary
Return type:

hazard_ratios_

log_likelihood_ratio_test
()¶ This function computes the likelihood ratio test for the Cox model. We compare the existing model (with all the covariates) to the trivial model of no covariates.
Conveniently, we can actually use CoxPHFitter class to do most of the work.

plot
(columns=None, ax=None, **errorbar_kwargs)¶ Produces a visual representation of the coefficients, including their standard errors and magnitudes.
Parameters:  columns (list, optional) – specify a subset of the columns to plot
 errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis

plot_covariate_groups
(**kwargs)¶ Deprecated as of v0.25.0. Use
plot_partial_effects_on_outcome
instead.

predict_log_partial_hazard
(X) → pandas.core.series.Series¶ This is equivalent to R’s linear.predictors. Returns the log of the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to \((x  \bar{x})'\beta\)
Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. Returns: Return type: DataFrame Note
If X is a DataFrame, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

predict_partial_hazard
(X) → pandas.core.series.Series¶ Returns the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to \(\exp{(x  \bar{x})'\beta }\)
Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. Returns: Return type: DataFrame Note
If X is a DataFrame, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

print_summary
(decimals=2, style=None, columns=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 columns – only display a subset of
summary
columns. Default all.  kwargs – print additional meta data in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

summary
¶ Summary statistics describing the fit.
Returns: df – Contains columns coef, np.exp(coef), se(coef), z, p, lower, upper Return type: DataFrame