# lifelines.fitters¶

class lifelines.fitters.aalen_additive_fitter.AalenAdditiveFitter(fit_intercept=True, alpha=0.05, coef_penalizer=0.0, smoothing_penalizer=0.0)

Bases: lifelines.fitters.BaseFitter

This class fits the regression model:

$h(t|x) = b_0(t) + b_1(t) x_1 + ... + b_N(t) x_N$

that is, the hazard rate is a linear function of the covariates with time-varying coefficients. This implementation assumes non-time-varying covariates, see TODO: name

Note

This class was rewritten in lifelines 0.17.0 to focus solely on static datasets. There is no guarantee of backwards compatibility.

Parameters: fit_intercept (bool, optional (default: True)) – If False, do not attach an intercept (column of ones) to the covariate matrix. The intercept, $$b_0(t)$$ acts as a baseline hazard. alpha (float, optional (default=0.05)) – the level in the confidence intervals. coef_penalizer (float, optional (default: 0)) – Attach a L2 penalizer to the size of the coefficients during regression. This improves stability of the estimates and controls for high correlation between covariates. For example, this shrinks the absolute value of $$c_{i,t}$$. smoothing_penalizer (float, optional (default: 0)) – Attach a L2 penalizer to difference between adjacent (over time) coefficients. For example, this shrinks the absolute value of $$c_{i,t} - c_{i,t+1}$$.
cumulative_hazards_

The estimated cumulative hazard

Type: DataFrame
hazards_

The estimated hazards

Type: DataFrame
confidence_intervals_

The lower and upper confidence intervals for the cumulative hazard

Type: DataFrame
durations

The durations provided

Type: array
event_observed

The event_observed variable provided

Type: array
weights

The event_observed variable provided

Type: array
fit(df, duration_col, event_col=None, weights_col=None, show_progress=False)
Parameters: Fit the Aalen Additive model to a dataset. df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored). duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes. event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored. weights_col (string, optional) – an optional column in the DataFrame, df, that denotes the weight per subject. This column is expelled and not used as a covariate, but as a weight in the final regression. Default weight is 1. This can be used for case-weights. For example, a weight of 2 means there were two subjects with identical observations. This can be used for sampling weights. show_progress (boolean, optional (default=False)) – Since the fitter is iterative, show iteration number. self – self with additional new properties: cumulative_hazards_, etc. AalenAdditiveFitter

Examples

>>> from lifelines import AalenAdditiveFitter
>>>
>>> df = pd.DataFrame({
>>>     'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>>     'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
>>>     'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
>>>     'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>> })
>>>
>>> aaf.fit(df, 'T', 'E')
>>> aaf.predict_median(df)
>>> aaf.print_summary()

plot(columns=None, loc=None, iloc=None, **kwargs)

” A wrapper around plotting. Matplotlib plot arguments can be passed in, plus:

Parameters: columns (string or list-like, optional) – If not empty, plot a subset of columns from the cumulative_hazards_. Default all. loc iloc (slice, optional) – specify a location-based subsection of the curves to plot, ex: .plot(iloc=slice(0,10)) will plot the first 10 time points.
predict_cumulative_hazard(X)

Returns the hazard rates for the individuals

Parameters: X (a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns) – can be in any order. If a numpy array, columns must be in the same order as the training data.
predict_expectation(X)

Compute the expected lifetime, E[T], using covariates X.

Parameters: X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. Returns the expected lifetimes for the individuals
predict_median(X)
Parameters: X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. Returns the median lifetimes for the individuals
predict_percentile(X, p=0.5)

Returns the median lifetimes for the individuals. http://stats.stackexchange.com/questions/102986/percentile-loss-functions

Parameters: X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. p (float) – default: 0.5
predict_survival_function(X)

Returns the survival functions for the individuals

Parameters: X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
print_summary(decimals=2, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters: decimals (int, optional (default=2)) – specify the number of decimal places to show kwargs – print additional meta data in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
score_

The concordance score (also known as the c-index) of the fit. The c-index is a generalization of the ROC AUC to survival data, including censorships.

For this purpose, the score_ is a measure of the predictive accuracy of the fitted model onto the training dataset. It’s analogous to the R^2 in linear models.

smoothed_hazards_(bandwidth=1)

Using the epanechnikov kernel to smooth the hazard function, with sigma/bandwidth

summary

Summary statistics describing the fit.

Returns: df DataFrame

## lifelines.fitters.aalen_johansen_fitter module¶

class lifelines.fitters.aalen_johansen_fitter.AalenJohansenFitter(jitter_level=0.0001, seed=None, alpha=0.05, calculate_variance=True)

Bases: lifelines.fitters.UnivariateFitter

Class for fitting the Aalen-Johansen estimate for the cumulative incidence function in a competing risks framework. Treating competing risks as censoring can result in over-estimated cumulative density functions. Using the Kaplan Meier estimator with competing risks as censored is akin to estimating the cumulative density if all competing risks had been prevented.

Aalen-Johansen cannot deal with tied times. We can get around this by randomly jittering the event times slightly. This will be done automatically and generates a warning.

Parameters: alpha (float, option (default=0.05)) – The alpha value associated with the confidence intervals. jitter_level (float, option (default=0.00001)) – If tied event times are detected, event times are randomly changed by this factor. seed (int, option (default=None)) – To produce replicate results with tied event times, the numpy.random.seed can be specified in the function. calculate_variance (bool, option (default=True)) – By default, AalenJohansenFitter calculates the variance and corresponding confidence intervals. Due to how the variance is calculated, the variance must be calculated for each event time individually. This is computationally intensive. For some procedures, like bootstrapping, the variance is not necessary. To reduce computation time during these procedures, calculate_variance can be set to False to skip the variance calculation.

Example

>>> from lifelines.datasets import load_waltons
>>> ajf = AalenJohansenFitter(calculate_variance=True)
>>> ajf.fit(T, E)
>>> ajf.cumulative_density_
>>> ajf.plot()


References

If you are interested in learning more, we recommend the following open-access paper; Edwards JK, Hester LL, Gokhale M, Lesko CR. Methodologic Issues When Estimating Risks in Pharmacoepidemiology. Curr Epidemiol Rep. 2016;3(4):285-296.

conditional_time_to_event_

Return a DataFrame, with index equal to survival_function_’s index, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

Returns: conditional_time_to_ DataFrame
cumulative_density_at_times(times, label=None)
cumulative_hazard_at_times(times, label=None)
divide(other)

Divide the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
fit(durations, event_observed, event_of_interest, timeline=None, entry=None, label='AJ_estimate', alpha=None, ci_labels=None, weights=None)
Parameters: durations (an array or pd.Series of length n – duration of subject was observed for) event_observed (an array, or pd.Series, of length n. Integer indicator of distinct events. Must be) – only positive integers, where 0 indicates censoring. event_of_interest (integer – indicator for event of interest. All other integers are considered competing events) – Ex) event_observed contains 0, 1, 2 where 0:censored, 1:lung cancer, and 2:death. If event_of_interest=1, then death (2) is considered a competing event. The returned cumulative incidence function corresponds to risk of lung cancer timeline (return the best estimate at the values in timelines (positively increasing)) entry (an array, or pd.Series, of length n – relative time when a subject entered the study. This is) – useful for left-truncated (not left-censored) observations. If None, all members of the population were born at time 0. label (a string to name the column of the estimate.) alpha (the alpha value in the confidence intervals. Overrides the initializing) – alpha for this call to fit only. ci_labels (add custom column names to the generated confidence intervals) – as a length-2 list: [, ]. Default:
hazard_at_times(times, label=None)
plot(**kwargs)

Plots a pretty figure of the model

Matplotlib plot arguments can be passed in inside the kwargs, plus

Parameters: show_censors (bool) – place markers at censorship events. Default: False censor_styles (bool) – If show_censors, this dictionary will be passed into the plot call. ci_alpha (bool) – the transparency level of the confidence interval. Default: 0.3 ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False ci_show (bool) – show confidence intervals. Default: True ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False at_risk_counts (bool) – show group sizes at time points. See function add_at_risk_counts for details. Default: False loc (slice) – specify a time-based subsection of the curves to plot, ex: >>> model.plot(loc=slice(0.,10.))  will plot the time values between t=0. and t=10. iloc (slice) – specify a location-based subsection of the curves to plot, ex: >>> model.plot(iloc=slice(0,10))  will plot the first 10 time points. invert_y_axis (bool) – boolean to invert the y-axis, useful to show cumulative graphs instead of survival graphs. (Deprecated, use plot_cumulative_density()) a pyplot axis object ax
plot_cumulative_density(**kwargs)
plot_cumulative_hazard(**kwargs)
plot_hazard(**kwargs)
plot_survival_function(**kwargs)
predict(times)

Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.

Parameters: times (a scalar or an array of times to predict the value of {0} at.) predictions a scalar if time is a scalar, a numpy array if time in an array.
subtract(other)

Subtract the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
survival_function_at_times(times, label=None)

## lifelines.fitters.breslow_fleming_harrington_fitter module¶

class lifelines.fitters.breslow_fleming_harrington_fitter.BreslowFlemingHarringtonFitter(alpha=0.05)

Bases: lifelines.fitters.UnivariateFitter

Class for fitting the Breslow-Fleming-Harrington estimate for the survival function. This estimator is a biased estimator of the survival function but is more stable when the population is small and there are too few early truncation times, it may happen that is the number of patients at risk and the number of deaths is the same.

Mathematically, the NAF estimator is the negative logarithm of the BFH estimator.

BreslowFlemingHarringtonFitter(alpha=0.05)

Parameters: alpha (float, optional (default=0.05)) – The alpha value associated with the confidence intervals.
conditional_time_to_event_

Return a DataFrame, with index equal to survival_function_’s index, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

Returns: conditional_time_to_ DataFrame
cumulative_density_at_times(times, label=None)
cumulative_hazard_at_times(times, label=None)
divide(other)

Divide the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
fit(durations, event_observed=None, timeline=None, entry=None, label='BFH_estimate', alpha=None, ci_labels=None)
Parameters: durations (an array, or pd.Series, of length n) – duration subject was observed for timeline – return the best estimate at the values in timelines (positively increasing) event_observed (an array, or pd.Series, of length n) – True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for left-truncated observations, i.e the birth event was not observed. If None, defaults to all 0 (all birth events observed.) label (string) – a string to name the column of the estimate. alpha (float, optional (default=0.05)) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only. ci_labels (iterable) – add custom column names to the generated confidence intervals as a length-2 list: [, ]. Default:
hazard_at_times(times, label=None)
plot(**kwargs)

Plots a pretty figure of the model

Matplotlib plot arguments can be passed in inside the kwargs, plus

Parameters: show_censors (bool) – place markers at censorship events. Default: False censor_styles (bool) – If show_censors, this dictionary will be passed into the plot call. ci_alpha (bool) – the transparency level of the confidence interval. Default: 0.3 ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False ci_show (bool) – show confidence intervals. Default: True ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False at_risk_counts (bool) – show group sizes at time points. See function add_at_risk_counts for details. Default: False loc (slice) – specify a time-based subsection of the curves to plot, ex: >>> model.plot(loc=slice(0.,10.))  will plot the time values between t=0. and t=10. iloc (slice) – specify a location-based subsection of the curves to plot, ex: >>> model.plot(iloc=slice(0,10))  will plot the first 10 time points. invert_y_axis (bool) – boolean to invert the y-axis, useful to show cumulative graphs instead of survival graphs. (Deprecated, use plot_cumulative_density()) a pyplot axis object ax
plot_cumulative_density(**kwargs)
plot_cumulative_hazard(**kwargs)
plot_hazard(**kwargs)
plot_survival_function(**kwargs)
predict(times)

Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.

Parameters: times (a scalar or an array of times to predict the value of {0} at.) predictions a scalar if time is a scalar, a numpy array if time in an array.
subtract(other)

Subtract the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
survival_function_at_times(times, label=None)

Return a Pandas series of the predicted survival value at specific times

Parameters: times (iterable or float) pd.Series

## lifelines.fitters.cox_time_varying_fitter module¶

class lifelines.fitters.cox_time_varying_fitter.CoxTimeVaryingFitter(alpha=0.05, penalizer=0.0, strata=None)

Bases: lifelines.fitters.BaseFitter

This class implements fitting Cox’s time-varying proportional hazard model:

$h(t|x(t)) = h_0(t)\exp(x(t)'\beta)$
Parameters: alpha (float, optional (default=0.05)) – the level in the confidence intervals. penalizer (float, optional) – the coefficient of an L2 penalizer in the regression
hazards_

The estimated hazards

Type: Series
confidence_intervals_

The lower and upper confidence intervals for the hazard coefficients

Type: DataFrame
event_observed

The event_observed variable provided

Type: Series
weights

The event_observed variable provided

Type: Series
variance_matrix_

The variance matrix of the coefficients

Type: numpy array
strata

the strata provided

Type: list
standard_errors_

the standard errors of the estimates

Type: Series
score_

the concordance index of the model.

Type: float
baseline_cumulative_hazard_
Type: DataFrame
baseline_survival_
Type: DataFrame
fit(df, id_col, event_col, start_col='start', stop_col='stop', weights_col=None, show_progress=False, step_size=None, robust=False, strata=None, initial_point=None)

Fit the Cox Proportional Hazard model to a time varying dataset. Tied survival times are handled using Efron’s tie-method.

Parameters: df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col, plus other covariates. duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored). id_col (string) – A subject could have multiple rows in the DataFrame. This column contains the unique identifier per subject. event_col (string) – the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are non-censored. start_col (string) – the column that contains the start of a subject’s time period. stop_col (string) – the column that contains the end of a subject’s time period. weights_col (string, optional) – the column that contains (possibly time-varying) weight of each subject-period row. show_progress (since the fitter is iterative, show convergence) – diagnostics. robust (boolean, optional (default: True)) – Compute the robust errors using the Huber sandwich estimator, aka Wei-Lin estimate. This does not handle ties, so if there are high number of ties, results may significantly differ. See “The Robust Inference for the Cox Proportional Hazards Model”, Journal of the American Statistical Association, Vol. 84, No. 408 (Dec., 1989), pp. 1074- 1078 step_size (float, optional) – set an initial step size for the fitting algorithm. strata (list or string, optional) – specify a column or list of columns n to use in stratification. This is useful if a categorical covariate does not obey the proportional hazard assumption. This is used similar to the strata expression in R. See http://courses.washington.edu/b515/l17.pdf. initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector. self – self, with additional properties like hazards_ and print_summary CoxTimeVaryingFitter
plot(columns=None, **errorbar_kwargs)

Produces a visual representation of the coefficients, including their standard errors and magnitudes.

Parameters: columns (list, optional) – specify a subset of the columns to plot errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command ax – the matplotlib axis that be edited. matplotlib axis
predict_log_partial_hazard(X)

This is equivalent to R’s linear.predictors. Returns the log of the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to :math:(x - bar{x})’beta

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. DataFrame

Note

If X is a DataFrame, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

predict_partial_hazard(X)

Returns the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to $$\exp{(x - \bar{x})'\beta }$$

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. DataFrame

Note

If X is a DataFrame, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

print_summary(decimals=2, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters: decimals (int, optional (default=2)) – specify the number of decimal places to show kwargs – print additional meta data in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
summary

Summary statistics describing the fit.

Returns: df – contains columns coef, exp(coef), se(coef), z, p, lower, upper DataFrame

## lifelines.fitters.coxph_fitter module¶

class lifelines.fitters.coxph_fitter.CoxPHFitter(alpha=0.05, tie_method='Efron', penalizer=0.0, strata=None)

Bases: lifelines.fitters.BaseFitter

This class implements fitting Cox’s proportional hazard model:

$h(t|x) = h_0(t) \exp((x - \overline{x})' \beta)$
Parameters: alpha (float, optional (default=0.05)) – the level in the confidence intervals. tie_method (string, optional) – specify how the fitter should deal with ties. Currently only ‘Efron’ is available. penalizer (float, optional (default=0.0)) – Attach an L2 penalizer to the size of the coefficients during regression. This improves stability of the estimates and controls for high correlation between covariates. For example, this shrinks the absolute value of $$\beta_i$$. The penalty is $$\frac{1}{2} \text{penalizer} ||\beta||^2$$. strata (list, optional) – specify a list of columns to use in stratification. This is useful if a categorical covariate does not obey the proportional hazard assumption. This is used similar to the strata expression in R. See http://courses.washington.edu/b515/l17.pdf.

Examples

>>> from lifelines.datasets import load_rossi
>>> from lifelines import CoxPHFitter
>>> cph = CoxPHFitter()
>>> cph.fit(rossi, 'week', 'arrest')
>>> cph.print_summary()

hazards_

The estimated hazards

Type: Series
confidence_intervals_

The lower and upper confidence intervals for the hazard coefficients

Type: DataFrame
durations

The durations provided

Type: Series
event_observed

The event_observed variable provided

Type: Series
weights

The event_observed variable provided

Type: Series
variance_matrix_

The variance matrix of the coefficients

Type: numpy array
strata

the strata provided

Type: list
standard_errors_

the standard errors of the estimates

Type: Series
score_

the concordance index of the model.

Type: float
baseline_hazard_
Type: DataFrame
baseline_cumulative_hazard_
Type: DataFrame
baseline_survival_
Type: DataFrame
check_assumptions(training_df, advice=True, show_plots=False, p_value_threshold=0.01, plot_n_bootstraps=10)

Use this function to test the proportional hazards assumption. See usage example at https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html

Parameters: training_df (DataFrame) – the original DataFrame used in the call to fit(...) or a sub-sampled version. advice (boolean, optional) – display advice as output to the user’s screen show_plots (boolean, optional) – display plots of the scaled schoenfeld residuals and loess curves. This is an eyeball test for violations. This will slow down the function significantly. p_value_threshold (float, optional) – the threshold to use to alert the user of violations. See note below. plot_n_bootstraps – in the plots displayed, also display plot_n_bootstraps bootstrapped loess curves. This will slow down the function significantly.

Examples

>>> from lifelines.datasets import load_rossi
>>> from lifelines import CoxPHFitter
>>>
>>> cph = CoxPHFitter().fit(rossi, 'week', 'arrest')
>>>
>>> cph.check_assumptions(rossi)


Notes

The p_value_threshold is arbitrarily set at 0.01. Under the null, some covariates will be below the threshold (i.e. by chance). This is compounded when there are many covariates.

Similarly, when there are lots of observations, even minor deviances from the proportional hazard assumption will be flagged.

With that in mind, it’s best to use a combination of statistical tests and eyeball tests to determine the most serious violations.

References

compute_residuals(training_dataframe, kind)
Parameters: training_dataframe (pandas DataFrame) – the same training DataFrame given in fit kind (string) – {‘schoenfeld’, ‘score’, ‘delta_beta’, ‘deviance’, ‘martingale’, ‘scaled_schoenfeld’}
fit(df, duration_col=None, event_col=None, show_progress=False, initial_point=None, strata=None, step_size=None, weights_col=None, cluster_col=None, robust=False, batch_mode=None)

Fit the Cox proportional hazard model to a dataset.

Parameters: df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights, strata). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored). duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes. event_col (string, optional) – the name of thecolumn in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored. weights_col (string, optional) – an optional column in the DataFrame, df, that denotes the weight per subject. This column is expelled and not used as a covariate, but as a weight in the final regression. Default weight is 1. This can be used for case-weights. For example, a weight of 2 means there were two subjects with identical observations. This can be used for sampling weights. In that case, use robust=True to get more accurate standard errors. show_progress (boolean, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing. initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector. strata (list or string, optional) – specify a column or list of columns n to use in stratification. This is useful if a categorical covariate does not obey the proportional hazard assumption. This is used similar to the strata expression in R. See http://courses.washington.edu/b515/l17.pdf. step_size (float, optional) – set an initial step size for the fitting algorithm. Setting to 1.0 may improve performance, but could also hurt convergence. robust (boolean, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator, aka Wei-Lin estimate. This does not handle ties, so if there are high number of ties, results may significantly differ. See “The Robust Inference for the Cox Proportional Hazards Model”, Journal of the American Statistical Association, Vol. 84, No. 408 (Dec., 1989), pp. 1074- 1078 cluster_col (string, optional) – specifies what column has unique identifiers for clustering covariances. Using this forces the sandwich estimator (robust variance estimator) to be used. batch_mode (bool, optional) – enabling batch_mode can be faster for datasets with a large number of ties. If left as None, lifelines will choose the best option. self – self with additional new properties: print_summary, hazards_, confidence_intervals_, baseline_survival_, etc. CoxPHFitter

Note

Tied survival times are handled using Efron’s tie-method.

Examples

>>> from lifelines import CoxPHFitter
>>>
>>> df = pd.DataFrame({
>>>     'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>>     'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
>>>     'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
>>>     'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>> })
>>>
>>> cph = CoxPHFitter()
>>> cph.fit(df, 'T', 'E')
>>> cph.print_summary()
>>> cph.predict_median(df)

>>> from lifelines import CoxPHFitter
>>>
>>> df = pd.DataFrame({
>>>     'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>>     'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
>>>     'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
>>>     'weights': [1.1, 0.5, 2.0, 1.6, 1.2, 4.3, 1.4, 4.5, 3.0, 3.2, 0.4, 6.2],
>>>     'month': [10, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>>     'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>> })
>>>
>>> cph = CoxPHFitter()
>>> cph.fit(df, 'T', 'E', strata=['month', 'age'], robust=True, weights_col='weights')
>>> cph.print_summary()
>>> cph.predict_median(df)

plot(columns=None, **errorbar_kwargs)

Produces a visual representation of the coefficients, including their standard errors and magnitudes.

Parameters: columns (list, optional) – specify a subset of the columns to plot errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command ax – the matplotlib axis that be edited. matplotlib axis
plot_covariate_groups(covariates, values, plot_baseline=True, **kwargs)

Produces a visual representation comparing the baseline survival curve of the model versus what happens when a covariate(s) is varied over values in a group. This is useful to compare subjects’ survival as we vary covariate(s), all else being held equal. The baseline survival curve is equal to the predicted survival curve at all average values in the original dataset.

Parameters: covariates (string or list) – a string (or list of strings) of the covariate(s) in the original dataset that we wish to vary. values (1d or 2d iterable) – an iterable of the values we wish the covariate(s) to take on. plot_baseline (bool) – also display the baseline survival, defined as the survival at the mean of the original dataset. kwargs – pass in additional plotting commands. ax – the matplotlib axis that be edited. matplotlib axis, or list of axis’

Examples

>>> from lifelines import datasets, CoxPHFitter
>>> cph = CoxPHFitter().fit(rossi, 'week', 'arrest')
>>> cph.plot_covariate_groups('prio', values=np.arange(0, 15), cmap='coolwarm')

>>> # multiple variables at once
>>> cph.plot_covariate_groups(['prio', 'paro'], values=[[0, 0], [5, 0], [10, 0], [0, 1], [5, 1], [10, 1]], cmap='coolwarm')

>>> # if you have categorical variables, you can simply things:
>>> cph.plot_covariate_groups(['dummy1', 'dummy2', 'dummy3'], values=np.eye(3))

predict_cumulative_hazard(X, times=None)
Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index. cumulative_hazard_ – the cumulative hazard of individuals over the timeline DataFrame
predict_expectation(X)

Compute the expected lifetime, $$E[T]$$, using covariates X. This algorithm to compute the expectation is to use the fact that $$E[T] = \int_0^\inf P(T > t) dt = \int_0^\inf S(t) dt$$. To compute the integral, we use the trapizoidal rule to approximate the integral.

Caution

However, if the survival function doesn’t converge to 0, the the expectation is really infinity and the returned values are meaningless/too large. In that case, using predict_median or predict_percentile would be better.

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. expectations DataFrame

Notes

If X is a DataFrame, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

predict_log_partial_hazard(X)

This is equivalent to R’s linear.predictors. Returns the log of the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to :math:(x - mean(x_{train}))’beta

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. log_partial_hazard DataFrame

Notes

If X is a DataFrame, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

predict_median(X)

Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. percentiles – the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity. DataFrame
predict_partial_hazard(X)
Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. partial_hazard – Returns the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to $$\exp{(x - mean(x_{train}))'\beta}$$ DataFrame

Notes

If X is a DataFrame, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

predict_percentile(X, p=0.5)

Returns the median lifetimes for the individuals, by default. If the survival curve of an individual does not cross 0.5, then the result is infinity. http://stats.stackexchange.com/questions/102986/percentile-loss-functions

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. p (float, optional (default=0.5)) – the percentile, must be between 0 and 1. percentiles DataFrame
predict_survival_function(X, times=None)

Predict the survival function for individuals, given their covariates. This assumes that the individual just entered the study (that is, we do not condition on how long they have already lived for.)

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index. survival_function – the survival probabilities of individuals over the timeline DataFrame
print_summary(decimals=2, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters: decimals (int, optional (default=2)) – specify the number of decimal places to show kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
score_

The concordance score (also known as the c-index) of the fit. The c-index is a generalization of the ROC AUC to survival data, including censorships.

For this purpose, the score_ is a measure of the predictive accuracy of the fitted model onto the training dataset.

References

https://stats.stackexchange.com/questions/133817/stratified-concordance-index-survivalsurvconcordance

summary

Summary statistics describing the fit. Set alpha property in the object before calling.

Returns: df – Contains columns coef, np.exp(coef), se(coef), z, p, lower, upper DataFrame

## lifelines.fitters.exponential_fitter module¶

class lifelines.fitters.exponential_fitter.ExponentialFitter(*args, **kwargs)

Bases: lifelines.fitters.KnownModelParametericUnivariateFitter

This class implements an Exponential model for univariate data. The model has parameterized form:

$S(t) = \exp\left(\frac{-t}{\lambda}\right), \lambda >0$

which implies the cumulative hazard rate is

$H(t) = \frac{t}{\lambda}$

and the hazard rate is:

$h(t) = \frac{1}{\lambda}$

After calling the .fit method, you have access to properties like: survival_function_, lambda_, cumulative_hazard_ A summary of the fit is available with the method print_summary()

Important

The parameterization of this model changed in lifelines 0.19.0. Previously, the cumulative hazard looked like $$\lambda t$$. The parameterization is now the reciprocal of $$\lambda$$.

Notes

cumulative_hazard_

The estimated cumulative hazard (with custom timeline if provided)

Type: DataFrame
confidence_interval_cumulative_hazard_

The lower and upper confidence intervals for the cumulative hazard

Type: DataFrame
hazard_

The estimated hazard (with custom timeline if provided)

Type: DataFrame
confidence_interval_hazard_

The lower and upper confidence intervals for the hazard

Type: DataFrame
survival_function_

The estimated survival function (with custom timeline if provided)

Type: DataFrame
confidence_interval_survival_function_

The lower and upper confidence intervals for the survival function

Type: DataFrame
variance_matrix_

The variance matrix of the coefficients

Type: numpy array
median_

The median time to event

Type: float
lambda_

The fitted parameter in the model

Type: float
durations

The durations provided

Type: array
event_observed

The event_observed variable provided

Type: array
timeline

The time line to use for plotting and indexing

Type: array
entry

The entry array provided, or None

Type: array or None
cumumlative_density_

The estimated cumulative density function (with custom timeline if provided)

Type: DataFrame
confidence_interval_cumumlative_density_

The lower and upper confidence intervals for the cumulative density

Type: DataFrame
conditional_time_to_event_

Return a DataFrame, with index equal to survival_function_’s index, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

Returns: conditional_time_to_ DataFrame
confidence_interval_

The confidence interval of the cumulative hazard. This is an alias for confidence_interval_cumulative_hazard_.

confidence_interval_cumulative_density_

The confidence interval of the survival function.

confidence_interval_cumulative_hazard_

The confidence interval of the cumulative hazard. This is an alias for confidence_interval_.

confidence_interval_hazard_

The confidence interval of the hazard.

confidence_interval_survival_function_

The confidence interval of the survival function.

cumulative_density_at_times(times, label=None)

Return a Pandas series of the predicted cumulative density function (1-survival function) at specific times.

Parameters: times (iterable or float) – values to return the survival function at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series
cumulative_hazard_at_times(times, label=None)

Return a Pandas series of the predicted cumulative hazard value at specific times.

Parameters: times (iterable or float) – values to return the cumulative hazard at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series
divide(other)

Divide the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
fit(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, left_censorship=False)
Parameters: durations (an array, or pd.Series) – length n, duration subject was observed for event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None timeline (list, optional) – return the estimate at the values in timeline (positively increasing) label (string, optional) – a string to name the column of the estimate. alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only. ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length-2 list: [, ]. Default:
hazard_at_times(times, label=None)

Return a Pandas series of the predicted hazard at specific times.

Parameters: times (iterable or float) – values to return the hazard at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series
median_

Return the unique time point, t, such that S(t) = 0.5. This is the “half-life” of the population, and a robust summary statistic for the population, if it exists.

plot(**kwargs)

Produce a pretty-plot of the estimate.

plot_cumulative_density(**kwargs)
plot_cumulative_hazard(**kwargs)
plot_hazard(**kwargs)
plot_survival_function(**kwargs)
predict(times)

Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.

Parameters: times (a scalar or an array of times to predict the value of {0} at.) predictions a scalar if time is a scalar, a numpy array if time in an array.
print_summary(decimals=2, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters: decimals (int, optional (default=2)) – specify the number of decimal places to show kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
subtract(other)

Subtract the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
summary

Summary statistics describing the fit.

Returns: df – Contains columns coef, exp(coef), se(coef), z, p, lower, upper pd.DataFrame

print_summary

survival_function_at_times(times, label=None)

Return a Pandas series of the predicted survival value at specific times.

Parameters: times (iterable or float) – values to return the survival function at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series

## lifelines.fitters.kaplan_meier_fitter module¶

class lifelines.fitters.kaplan_meier_fitter.KaplanMeierFitter(alpha=0.05)

Bases: lifelines.fitters.UnivariateFitter

Class for fitting the Kaplan-Meier estimate for the survival function.

Parameters: alpha (float, option (default=0.05)) – The alpha value associated with the confidence intervals.

Examples

>>> from lifelines import KaplanMeierFitter
>>> kmf = KaplanMeierFitter()
>>> kmf.fit(waltons['T'], waltons['E'])
>>> kmf.plot()

survival_function_

The estimated survival function (with custom timeline if provided)

Type: DataFrame
median_

The estimated median time to event. np.inf if doesn’t exist.

Type: float
confidence_interval_

The lower and upper confidence intervals for the survival function. An alias of confidence_interval_survival_function_

Type: DataFrame
confidence_interval_survival_function_

The lower and upper confidence intervals for the survival function. An alias of confidence_interval_

Type: DataFrame
cumumlative_density_

The estimated cumulative density function (with custom timeline if provided)

Type: DataFrame
confidence_interval_cumumlative_density_

The lower and upper confidence intervals for the cumulative density

Type: DataFrame
durations

The durations provided

Type: array
event_observed

The event_observed variable provided

Type: array
timeline

The time line to use for plotting and indexing

Type: array
entry

The entry array provided, or None

Type: array or None
event_table

A summary of the life table

Type: DataFrame
conditional_time_to_event_

Return a DataFrame, with index equal to survival_function_’s index, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

Returns: conditional_time_to_ DataFrame
cumulative_density_at_times(times, label=None)

Return a Pandas series of the predicted cumulative density at specific times

Parameters: times (iterable or float) pd.Series
cumulative_hazard_at_times(times, label=None)
divide(other)

Divide the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
fit(durations, event_observed=None, timeline=None, entry=None, label='KM_estimate', alpha=None, left_censorship=False, ci_labels=None, weights=None)
Parameters: durations (an array, list, pd.DataFrame or pd.Series) – length n – duration subject was observed for event_observed (an array, list, pd.DataFrame, or pd.Series, optional) – True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None timeline (an array, list, pd.DataFrame, or pd.Series, optional) – return the best estimate at the values in timelines (postively increasing) entry (an array, list, pd.DataFrame, or pd.Series, optional) – relative time when a subject entered the study. This is useful for left-truncated (not left-censored) observations. If None, all members of the population entered study when they were “born”. label (string, optional) – a string to name the column of the estimate. alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only. left_censorship (bool, optional (default=False)) – True if durations and event_observed refer to left censorship events. Default False ci_labels (tuple, optional) – add custom column names to the generated confidence intervals as a length-2 list: [, ]. Default:
hazard_at_times(times, label=None)
plot(**kwargs)

Plots a pretty figure of the model

Matplotlib plot arguments can be passed in inside the kwargs, plus

Parameters: show_censors (bool) – place markers at censorship events. Default: False censor_styles (bool) – If show_censors, this dictionary will be passed into the plot call. ci_alpha (bool) – the transparency level of the confidence interval. Default: 0.3 ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False ci_show (bool) – show confidence intervals. Default: True ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False at_risk_counts (bool) – show group sizes at time points. See function add_at_risk_counts for details. Default: False loc (slice) – specify a time-based subsection of the curves to plot, ex: >>> model.plot(loc=slice(0.,10.))  will plot the time values between t=0. and t=10. iloc (slice) – specify a location-based subsection of the curves to plot, ex: >>> model.plot(iloc=slice(0,10))  will plot the first 10 time points. invert_y_axis (bool) – boolean to invert the y-axis, useful to show cumulative graphs instead of survival graphs. (Deprecated, use plot_cumulative_density()) a pyplot axis object ax
plot_cumulative_density(**kwargs)

Plots a pretty figure of {0}.{1}

Matplotlib plot arguments can be passed in inside the kwargs, plus

Parameters: show_censors (bool) – place markers at censorship events. Default: False censor_styles (bool) – If show_censors, this dictionary will be passed into the plot call. ci_alpha (bool) – the transparency level of the confidence interval. Default: 0.3 ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False ci_show (bool) – show confidence intervals. Default: True ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False at_risk_counts (bool) – show group sizes at time points. See function add_at_risk_counts for details. Default: False loc (slice) – specify a time-based subsection of the curves to plot, ex: >>> model.plot(loc=slice(0.,10.))  will plot the time values between t=0. and t=10. iloc (slice) – specify a location-based subsection of the curves to plot, ex: >>> model.plot(iloc=slice(0,10))  will plot the first 10 time points. invert_y_axis (bool) – boolean to invert the y-axis, useful to show cumulative graphs instead of survival graphs. (Deprecated, use plot_cumulative_density()) a pyplot axis object ax
plot_cumulative_hazard(**kwargs)
plot_hazard(**kwargs)
plot_loglogs(*args, **kwargs)

Plot $$\log(S(t))$$ against $$\log(t)$$

plot_survival_function(**kwargs)

Alias of plot

predict(times)

Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.

Parameters: times (a scalar or an array of times to predict the value of {0} at.) predictions a scalar if time is a scalar, a numpy array if time in an array.
subtract(other)

Subtract the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
survival_function_at_times(times, label=None)

Return a Pandas series of the predicted survival value at specific times

Parameters: times (iterable or float) pd.Series

## lifelines.fitters.log_logistic_fitter module¶

class lifelines.fitters.log_logistic_fitter.LogLogisticFitter(*args, **kwargs)

Bases: lifelines.fitters.KnownModelParametericUnivariateFitter

This class implements a Log-Logistic model for univariate data. The model has parameterized form:

$S(t) = \left(1 + \left(\frac{t}{\alpha}\right)^{\beta}\right)^{-1}, \alpha > 0, \beta > 0,$

and the hazard rate is:

$h(t) = \frac{\left(\frac{\beta}{\alpha}\right)\left(\frac{t}{\alpha}\right) ^ {\beta-1}}{\left(1 + \left(\frac{t}{\alpha}\right)^{\beta}\right)}$

and the cumulative hazard is:

$H(t) = \log\left(\left(\frac{t}{\alpha}\right) ^ {\beta} + 1\right)$

After calling the .fit method, you have access to properties like: cumulative_hazard_, plot, survival_function_, alpha_ and beta_. A summary of the fit is available with the method ‘print_summary()’

Examples

>>> from lifelines import LogLogisticFitter
>>> llf = LogLogisticFitter()
>>> llf.fit(waltons['T'], waltons['E'])
>>> llf.plot()
>>> print(llf.alpha_)

cumulative_hazard_

The estimated cumulative hazard (with custom timeline if provided)

Type: DataFrame
confidence_interval_cumulative_hazard_

The lower and upper confidence intervals for the cumulative hazard

Type: DataFrame
hazard_

The estimated hazard (with custom timeline if provided)

Type: DataFrame
confidence_interval_hazard_

The lower and upper confidence intervals for the hazard

Type: DataFrame
survival_function_

The estimated survival function (with custom timeline if provided)

Type: DataFrame
confidence_interval_survival_function_

The lower and upper confidence intervals for the survival function

Type: DataFrame
cumumlative_density_

The estimated cumulative density function (with custom timeline if provided)

Type: DataFrame
confidence_interval_cumumlative_density_

The lower and upper confidence intervals for the cumulative density

Type: DataFrame
variance_matrix_

The variance matrix of the coefficients

Type: numpy array
median_

The median time to event

Type: float
alpha_

The fitted parameter in the model

Type: float
beta_

The fitted parameter in the model

Type: float
durations

The durations provided

Type: array
event_observed

The event_observed variable provided

Type: array
timeline

The time line to use for plotting and indexing

Type: array
entry

The entry array provided, or None

Type: array or None
conditional_time_to_event_

Return a DataFrame, with index equal to survival_function_’s index, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

Returns: conditional_time_to_ DataFrame
confidence_interval_

The confidence interval of the cumulative hazard. This is an alias for confidence_interval_cumulative_hazard_.

confidence_interval_cumulative_density_

The confidence interval of the survival function.

confidence_interval_cumulative_hazard_

The confidence interval of the cumulative hazard. This is an alias for confidence_interval_.

confidence_interval_hazard_

The confidence interval of the hazard.

confidence_interval_survival_function_

The confidence interval of the survival function.

cumulative_density_at_times(times, label=None)

Return a Pandas series of the predicted cumulative density function (1-survival function) at specific times.

Parameters: times (iterable or float) – values to return the survival function at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series
cumulative_hazard_at_times(times, label=None)

Return a Pandas series of the predicted cumulative hazard value at specific times.

Parameters: times (iterable or float) – values to return the cumulative hazard at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series
divide(other)

Divide the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
fit(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, left_censorship=False)
Parameters: durations (an array, or pd.Series) – length n, duration subject was observed for event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None timeline (list, optional) – return the estimate at the values in timeline (positively increasing) label (string, optional) – a string to name the column of the estimate. alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only. ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length-2 list: [, ]. Default:
hazard_at_times(times, label=None)

Return a Pandas series of the predicted hazard at specific times.

Parameters: times (iterable or float) – values to return the hazard at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series
median_

Return the unique time point, t, such that S(t) = 0.5. This is the “half-life” of the population, and a robust summary statistic for the population, if it exists.

plot(**kwargs)

Produce a pretty-plot of the estimate.

plot_cumulative_density(**kwargs)
plot_cumulative_hazard(**kwargs)
plot_hazard(**kwargs)
plot_survival_function(**kwargs)
predict(times)

Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.

Parameters: times (a scalar or an array of times to predict the value of {0} at.) predictions a scalar if time is a scalar, a numpy array if time in an array.
print_summary(decimals=2, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters: decimals (int, optional (default=2)) – specify the number of decimal places to show kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
subtract(other)

Subtract the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
summary

Summary statistics describing the fit.

Returns: df – Contains columns coef, exp(coef), se(coef), z, p, lower, upper pd.DataFrame

print_summary

survival_function_at_times(times, label=None)

Return a Pandas series of the predicted survival value at specific times.

Parameters: times (iterable or float) – values to return the survival function at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series

## lifelines.fitters.log_normal_fitter module¶

class lifelines.fitters.log_normal_fitter.LogNormalFitter(*args, **kwargs)

Bases: lifelines.fitters.KnownModelParametericUnivariateFitter

This class implements an Log Normal model for univariate data. The model has parameterized form:

$S(t) = 1 - \Phi((\log(t) - \mu)/\sigma), \sigma >0$

where $$\Phi$$ is the CDF of a standard normal random variable. This implies the cumulative hazard rate is

$H(t) = -\log(1 - \Phi((\log(t) - \mu)/\sigma))$

After calling the .fit method, you have access to properties like: survival_function_, mu_, sigma_. A summary of the fit is available with the method print_summary()

cumulative_hazard_

The estimated cumulative hazard (with custom timeline if provided)

Type: DataFrame
confidence_interval_cumulative_hazard_

The lower and upper confidence intervals for the cumulative hazard

Type: DataFrame
hazard_

The estimated hazard (with custom timeline if provided)

Type: DataFrame
confidence_interval_hazard_

The lower and upper confidence intervals for the hazard

Type: DataFrame
survival_function_

The estimated survival function (with custom timeline if provided)

Type: DataFrame
confidence_interval_survival_function_

The lower and upper confidence intervals for the survival function

Type: DataFrame
cumumlative_density_

The estimated cumulative density function (with custom timeline if provided)

Type: DataFrame
confidence_interval_cumumlative_density_

The lower and upper confidence intervals for the cumulative density

Type: DataFrame
variance_matrix_

The variance matrix of the coefficients

Type: numpy array
median_

The median time to event

Type: float
mu_

The fitted parameter in the model

Type: float
sigma_

The fitted parameter in the model

Type: float
durations

The durations provided

Type: array
event_observed

The event_observed variable provided

Type: array
timeline

The time line to use for plotting and indexing

Type: array
entry

The entry array provided, or None

Type: array or None
conditional_time_to_event_

Return a DataFrame, with index equal to survival_function_’s index, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

Returns: conditional_time_to_ DataFrame
confidence_interval_

The confidence interval of the cumulative hazard. This is an alias for confidence_interval_cumulative_hazard_.

confidence_interval_cumulative_density_

The confidence interval of the survival function.

confidence_interval_cumulative_hazard_

The confidence interval of the cumulative hazard. This is an alias for confidence_interval_.

confidence_interval_hazard_

The confidence interval of the hazard.

confidence_interval_survival_function_

The confidence interval of the survival function.

cumulative_density_at_times(times, label=None)

Return a Pandas series of the predicted cumulative density function (1-survival function) at specific times.

Parameters: times (iterable or float) – values to return the survival function at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series
cumulative_hazard_at_times(times, label=None)

Return a Pandas series of the predicted cumulative hazard value at specific times.

Parameters: times (iterable or float) – values to return the cumulative hazard at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series
divide(other)

Divide the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
fit(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, left_censorship=False)
Parameters: durations (an array, or pd.Series) – length n, duration subject was observed for event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None timeline (list, optional) – return the estimate at the values in timeline (positively increasing) label (string, optional) – a string to name the column of the estimate. alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only. ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length-2 list: [, ]. Default:
hazard_at_times(times, label=None)

Return a Pandas series of the predicted hazard at specific times.

Parameters: times (iterable or float) – values to return the hazard at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series
median_

Return the unique time point, t, such that S(t) = 0.5. This is the “half-life” of the population, and a robust summary statistic for the population, if it exists.

plot(**kwargs)

Produce a pretty-plot of the estimate.

plot_cumulative_density(**kwargs)
plot_cumulative_hazard(**kwargs)
plot_hazard(**kwargs)
plot_survival_function(**kwargs)
predict(times)

Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.

Parameters: times (a scalar or an array of times to predict the value of {0} at.) predictions a scalar if time is a scalar, a numpy array if time in an array.
print_summary(decimals=2, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters: decimals (int, optional (default=2)) – specify the number of decimal places to show kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
subtract(other)

Subtract the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
summary

Summary statistics describing the fit.

Returns: df – Contains columns coef, exp(coef), se(coef), z, p, lower, upper pd.DataFrame

print_summary

survival_function_at_times(times, label=None)

Return a Pandas series of the predicted survival value at specific times.

Parameters: times (iterable or float) – values to return the survival function at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series

## lifelines.fitters.nelson_aalen_fitter module¶

class lifelines.fitters.nelson_aalen_fitter.NelsonAalenFitter(alpha=0.05, nelson_aalen_smoothing=True)

Bases: lifelines.fitters.UnivariateFitter

Class for fitting the Nelson-Aalen estimate for the cumulative hazard.

NelsonAalenFitter(alpha=0.05, nelson_aalen_smoothing=True)

Parameters: alpha (float, optional (default=0.05)) – The alpha value associated with the confidence intervals. nelson_aalen_smoothing (bool, optional) – If the event times are naturally discrete (like discrete years, minutes, etc.) then it is advisable to turn this parameter to False. See [1], pg.84.

Notes

[1] Aalen, O., Borgan, O., Gjessing, H., 2008. Survival and Event History Analysis

cumulative_hazard_

The estimated cumulative hazard (with custom timeline if provided)

Type: DataFrame
confidence_interval_

The lower and upper confidence intervals for the cumulative hazard

Type: DataFrame
durations

The durations provided

Type: array
event_observed

The event_observed variable provided

Type: array
timeline

The time line to use for plotting and indexing

Type: array
entry

The entry array provided, or None

Type: array or None
event_table

A summary of the life table

Type: DataFrame
conditional_time_to_event_

Return a DataFrame, with index equal to survival_function_’s index, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

Returns: conditional_time_to_ DataFrame
cumulative_density_at_times(times, label=None)
cumulative_hazard_at_times(times, label=None)
divide(other)

Divide the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
fit(durations, event_observed=None, timeline=None, entry=None, label='NA_estimate', alpha=None, ci_labels=None, weights=None)
Parameters: durations (an array, or pd.Series, of length n) – duration subject was observed for timeline (iterable) – return the best estimate at the values in timelines (positively increasing) event_observed (an array, or pd.Series, of length n) – True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for left-truncated observations, i.e the birth event was not observed. If None, defaults to all 0 (all birth events observed.) label (string) – a string to name the column of the estimate. alpha (float) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only. ci_labels (iterable) – add custom column names to the generated confidence intervals as a length-2 list: [, ]. Default:
hazard_at_times(times, label=None)
plot(**kwargs)

Plots a pretty figure of the model

Matplotlib plot arguments can be passed in inside the kwargs, plus

Parameters: show_censors (bool) – place markers at censorship events. Default: False censor_styles (bool) – If show_censors, this dictionary will be passed into the plot call. ci_alpha (bool) – the transparency level of the confidence interval. Default: 0.3 ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False ci_show (bool) – show confidence intervals. Default: True ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False at_risk_counts (bool) – show group sizes at time points. See function add_at_risk_counts for details. Default: False loc (slice) – specify a time-based subsection of the curves to plot, ex: >>> model.plot(loc=slice(0.,10.))  will plot the time values between t=0. and t=10. iloc (slice) – specify a location-based subsection of the curves to plot, ex: >>> model.plot(iloc=slice(0,10))  will plot the first 10 time points. invert_y_axis (bool) – boolean to invert the y-axis, useful to show cumulative graphs instead of survival graphs. (Deprecated, use plot_cumulative_density()) a pyplot axis object ax
plot_cumulative_density(**kwargs)
plot_cumulative_hazard(**kwargs)
plot_hazard(**kwargs)
plot_survival_function(**kwargs)
predict(times)

Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.

Parameters: times (a scalar or an array of times to predict the value of {0} at.) predictions a scalar if time is a scalar, a numpy array if time in an array.
smoothed_hazard_(bandwidth)
Parameters: bandwidth (float) – the bandwith used in the Epanechnikov kernel. a DataFrame of the smoothed hazard DataFrame
smoothed_hazard_confidence_intervals_(bandwidth, hazard_=None)
Parameters: bandwidth (float) – the bandwidth to use in the Epanechnikov kernel. > 0 hazard_ (numpy array) – a computed (n,) numpy array of estimated hazard rates. If none, uses smoothed_hazard_
subtract(other)

Subtract the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
survival_function_at_times(times, label=None)

## lifelines.fitters.piecewise_exponential_fitter module¶

class lifelines.fitters.piecewise_exponential_fitter.PiecewiseExponentialFitter(breakpoints, *args, **kwargs)

Bases: lifelines.fitters.KnownModelParametericUnivariateFitter

This class implements an Piecewise Exponential model for univariate data. The model has parameterized hazard rate:

$\begin{split}h(t) = \begin{cases} 1/\lambda_0, & \text{if t \le \tau_0} \\ 1/\lambda_1 & \text{if \tau_0 < t \le \tau_1} \\ 1/\lambda_2 & \text{if \tau_1 < t \le \tau_2} \\ ... \end{cases}\end{split}$

You specify the breakpoints, $$\tau_i$$, and lifelines will find the optional values for the parameters.

After calling the .fit method, you have access to properties like: survival_function_, plot, cumulative_hazard_ A summary of the fit is available with the method print_summary()

Important

The parameterization of this model changed in lifelines 0.19.1. Previously, the cumulative hazard looked like $$\lambda_i t$$. The parameterization is now the reciprocal of $$\lambda_i$$.

cumulative_hazard_

The estimated cumulative hazard (with custom timeline if provided)

Type: DataFrame
confidence_interval_cumulative_hazard_

The lower and upper confidence intervals for the cumulative hazard

Type: DataFrame
hazard_

The estimated hazard (with custom timeline if provided)

Type: DataFrame
confidence_interval_hazard_

The lower and upper confidence intervals for the hazard

Type: DataFrame
survival_function_

The estimated survival function (with custom timeline if provided)

Type: DataFrame
confidence_interval_survival_function_

The lower and upper confidence intervals for the survival function

Type: DataFrame
cumumlative_density_

The estimated cumulative density function (with custom timeline if provided)

Type: DataFrame
confidence_interval_cumumlative_density_

The lower and upper confidence intervals for the cumulative density

Type: DataFrame
variance_matrix_

The variance matrix of the coefficients

Type: numpy array
median_

The median time to event

Type: float
lambda_i_

The fitted parameter in the model, for i = 0, 1 … n-1 breakpoints

Type: float
durations

The durations provided

Type: array
event_observed

The event_observed variable provided

Type: array
timeline

The time line to use for plotting and indexing

Type: array
entry

The entry array provided, or None

Type: array or None
breakpoints

The provided breakpoints

Type: array
conditional_time_to_event_

Return a DataFrame, with index equal to survival_function_’s index, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

Returns: conditional_time_to_ DataFrame
confidence_interval_

The confidence interval of the cumulative hazard. This is an alias for confidence_interval_cumulative_hazard_.

confidence_interval_cumulative_density_

The confidence interval of the survival function.

confidence_interval_cumulative_hazard_

The confidence interval of the cumulative hazard. This is an alias for confidence_interval_.

confidence_interval_hazard_

The confidence interval of the hazard.

confidence_interval_survival_function_

The confidence interval of the survival function.

cumulative_density_at_times(times, label=None)

Return a Pandas series of the predicted cumulative density function (1-survival function) at specific times.

Parameters: times (iterable or float) – values to return the survival function at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series
cumulative_hazard_at_times(times, label=None)

Return a Pandas series of the predicted cumulative hazard value at specific times.

Parameters: times (iterable or float) – values to return the cumulative hazard at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series
divide(other)

Divide the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
fit(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, left_censorship=False)
Parameters: durations (an array, or pd.Series) – length n, duration subject was observed for event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None timeline (list, optional) – return the estimate at the values in timeline (positively increasing) label (string, optional) – a string to name the column of the estimate. alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only. ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length-2 list: [, ]. Default:
hazard_at_times(times, label=None)

Return a Pandas series of the predicted hazard at specific times.

Parameters: times (iterable or float) – values to return the hazard at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series
median_

Return the unique time point, t, such that S(t) = 0.5. This is the “half-life” of the population, and a robust summary statistic for the population, if it exists.

plot(**kwargs)

Produce a pretty-plot of the estimate.

plot_cumulative_density(**kwargs)
plot_cumulative_hazard(**kwargs)
plot_hazard(**kwargs)
plot_survival_function(**kwargs)
predict(times)

Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.

Parameters: times (a scalar or an array of times to predict the value of {0} at.) predictions a scalar if time is a scalar, a numpy array if time in an array.
print_summary(decimals=2, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters: decimals (int, optional (default=2)) – specify the number of decimal places to show kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
subtract(other)

Subtract the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
summary

Summary statistics describing the fit.

Returns: df – Contains columns coef, exp(coef), se(coef), z, p, lower, upper pd.DataFrame

print_summary

survival_function_at_times(times, label=None)

Return a Pandas series of the predicted survival value at specific times.

Parameters: times (iterable or float) – values to return the survival function at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series

## lifelines.fitters.weibull_fitter module¶

class lifelines.fitters.weibull_fitter.WeibullFitter(*args, **kwargs)

Bases: lifelines.fitters.KnownModelParametericUnivariateFitter

This class implements a Weibull model for univariate data. The model has parameterized form:

$S(t) = \exp\left(-\left(\frac{t}{\lambda}\right)^\rho\right), \lambda > 0, \rho > 0,$

which implies the cumulative hazard rate is

$H(t) = \left(\frac{t}{\lambda}\right)^\rho,$

and the hazard rate is:

$h(t) = \frac{\rho}{\lambda}\left(\frac{t}{\lambda}\right)^{\rho-1}$

After calling the .fit method, you have access to properties like: cumulative_hazard_, survival_function_, lambda_ and rho_. A summary of the fit is available with the method print_summary().

Important

The parameterization of this model changed in lifelines 0.19.0. Previously, the cumulative hazard looked like $$(\lambda t)^\rho$$. The parameterization is now the reciprocal of $$\lambda$$.

Examples

>>> from lifelines import WeibullFitter
>>> wbf = WeibullFitter()
>>> wbf.fit(waltons['T'], waltons['E'])
>>> wbf.plot()
>>> print(wbf.lambda_)

cumulative_hazard_

The estimated cumulative hazard (with custom timeline if provided)

Type: DataFrame
confidence_interval_cumulative_hazard_

The lower and upper confidence intervals for the cumulative hazard

Type: DataFrame
hazard_

The estimated hazard (with custom timeline if provided)

Type: DataFrame
confidence_interval_hazard_

The lower and upper confidence intervals for the hazard

Type: DataFrame
survival_function_

The estimated survival function (with custom timeline if provided)

Type: DataFrame
confidence_interval_survival_function_

The lower and upper confidence intervals for the survival function

Type: DataFrame
cumumlative_density_

The estimated cumulative density function (with custom timeline if provided)

Type: DataFrame
confidence_interval_cumumlative_density_

The lower and upper confidence intervals for the cumulative density

Type: DataFrame
variance_matrix_

The variance matrix of the coefficients

Type: numpy array
median_

The median time to event

Type: float
lambda_

The fitted parameter in the model

Type: float
rho_

The fitted parameter in the model

Type: float
durations

The durations provided

Type: array
event_observed

The event_observed variable provided

Type: array
timeline

The time line to use for plotting and indexing

Type: array
entry

The entry array provided, or None

Type: array or None
conditional_time_to_event_

Return a DataFrame, with index equal to survival_function_’s index, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

Returns: conditional_time_to_ DataFrame
confidence_interval_

The confidence interval of the cumulative hazard. This is an alias for confidence_interval_cumulative_hazard_.

confidence_interval_cumulative_density_

The confidence interval of the survival function.

confidence_interval_cumulative_hazard_

The confidence interval of the cumulative hazard. This is an alias for confidence_interval_.

confidence_interval_hazard_

The confidence interval of the hazard.

confidence_interval_survival_function_

The confidence interval of the survival function.

cumulative_density_at_times(times, label=None)

Return a Pandas series of the predicted cumulative density function (1-survival function) at specific times.

Parameters: times (iterable or float) – values to return the survival function at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series
cumulative_hazard_at_times(times, label=None)

Return a Pandas series of the predicted cumulative hazard value at specific times.

Parameters: times (iterable or float) – values to return the cumulative hazard at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series
divide(other)

Divide the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
fit(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, left_censorship=False)
Parameters: durations (an array, or pd.Series) – length n, duration subject was observed for event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None timeline (list, optional) – return the estimate at the values in timeline (positively increasing) label (string, optional) – a string to name the column of the estimate. alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only. ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length-2 list: [, ]. Default:
hazard_at_times(times, label=None)

Return a Pandas series of the predicted hazard at specific times.

Parameters: times (iterable or float) – values to return the hazard at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series
median_

Return the unique time point, t, such that S(t) = 0.5. This is the “half-life” of the population, and a robust summary statistic for the population, if it exists.

plot(**kwargs)

Produce a pretty-plot of the estimate.

plot_cumulative_density(**kwargs)
plot_cumulative_hazard(**kwargs)
plot_hazard(**kwargs)
plot_survival_function(**kwargs)
predict(times)

Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.

Parameters: times (a scalar or an array of times to predict the value of {0} at.) predictions a scalar if time is a scalar, a numpy array if time in an array.
print_summary(decimals=2, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters: decimals (int, optional (default=2)) – specify the number of decimal places to show kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
subtract(other)

Subtract the {0} of two {1} objects.

Parameters: other (an {1} fitted instance.)
summary

Summary statistics describing the fit.

Returns: df – Contains columns coef, exp(coef), se(coef), z, p, lower, upper pd.DataFrame

print_summary

survival_function_at_times(times, label=None)

Return a Pandas series of the predicted survival value at specific times.

Parameters: times (iterable or float) – values to return the survival function at. label (string, optional) – Rename the series returned. Useful for plotting. pd.Series

## lifelines.fitters.weibull_aft_fitter module¶

class lifelines.fitters.weibull_aft_fitter.WeibullAFTFitter(alpha=0.05, penalizer=0.0, l1_ratio=0.0, fit_intercept=True)

Bases: lifelines.fitters.ParametericRegressionFitter

This class implements a Weibull AFT model. The model has parameterized form, with $$\lambda(x) = \exp\left(\beta_0 + \beta_1x_1 + ... + \beta_n x_n \right)$$, and optionally, $$\rho(y) = \exp\left(\alpha_0 + \alpha_1 y_1 + ... + \alpha_m y_m \right)$$,

$S(t; x, y) = \exp\left(-\left(\frac{t}{\lambda(x)}\right)^{\rho(y)}\right),$

which implies the cumulative hazard rate is

$H(t; x, y) = \left(\frac{t}{\lambda(x)} \right)^{\rho(y)},$

After calling the .fit method, you have access to properties like: params_, print_summary(). A summary of the fit is available with the method print_summary().

Parameters: alpha (float, optional (default=0.05)) – the level in the confidence intervals. fit_intercept (boolean, optional (default=True)) – Allow lifelines to add an intercept column of 1s to df, and ancillary_df if applicable. penalizer (float, optional (default=0.0)) – the penalizer coefficient to the size of the coefficients. See l1_ratio. Must be equal to or greater than 0. l1_ratio (float, optional (default=0.0)) – how much of the penalizer should be attributed to an l1 penalty (otherwise an l2 penalty). The penalty function looks like penalizer * l1_ratio * ||w||_1 + 0.5 * penalizer * (1 - l1_ratio) * ||w||^2_2
params_

The estimated coefficients

Type: DataFrame
confidence_intervals_

The lower and upper confidence intervals for the coefficients

Type: DataFrame
durations

The event_observed variable provided

Type: Series
event_observed

The event_observed variable provided

Type: Series
weights

The event_observed variable provided

Type: Series
variance_matrix_

The variance matrix of the coefficients

Type: numpy array
standard_errors_

the standard errors of the estimates

Type: Series
score_

the concordance index of the model.

Type: float
fit(df, duration_col=None, event_col=None, ancillary_df=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None)

Fit the accelerated failure time model to a dataset.

Parameters: df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored). duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes. event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored. show_progress (boolean, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing. ancillary_df (None, boolean, or DataFrame, optional (default=None)) – Choose to model the ancillary parameters. If None or False, explicitly do not fit the ancillary parameters using any covariates. If True, model the ancillary parameters with the same covariates as df. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count as df. timeline (array, optional) – Specify a timeline that will be used for plotting and prediction weights_col (string) – the column in df that specifies weights per observation. robust (boolean, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator. initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector. self with additional new properties: print_summary, params_, confidence_intervals_ and more self

Examples

>>> from lifelines import WeibullAFTFitter
>>>
>>> df = pd.DataFrame({
>>>     'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>>     'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
>>>     'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
>>>     'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>> })
>>>
>>> aft = WeibullAFTFitter()
>>> aft.fit(df, 'T', 'E')
>>> aft.print_summary()
>>> aft.predict_median(df)
>>>
>>> aft = WeibullAFTFitter()
>>> aft.fit(df, 'T', 'E', ancillary_df=df)
>>> aft.print_summary()
>>> aft.predict_median(df)

mean_survival_time_
median_survival_time_
plot(columns=None, parameter=None, **errorbar_kwargs)

Produces a visual representation of the coefficients, including their standard errors and magnitudes.

Parameters: columns (list, optional) – specify a subset of the columns to plot errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command ax – the matplotlib axis that be edited. matplotlib axis
plot_covariate_groups(covariates, values, plot_baseline=True, **kwargs)

Produces a visual representation comparing the baseline survival curve of the model versus what happens when a covariate(s) is varied over values in a group. This is useful to compare subjects’ survival as we vary covariate(s), all else being held equal. The baseline survival curve is equal to the predicted survival curve at all average values in the original dataset.

Parameters: covariates (string or list) – a string (or list of strings) of the covariate in the original dataset that we wish to vary. values (1d or 2d iterable) – an iterable of the values we wish the covariate to take on. plot_baseline (bool) – also display the baseline survival, defined as the survival at the mean of the original dataset. kwargs – pass in additional plotting commands ax – the matplotlib axis that be edited. matplotlib axis, or list of axis’

Examples

>>> from lifelines import datasets, WeibullAFTFitter
>>> wf = WeibullAFTFitter().fit(rossi, 'week', 'arrest')
>>> wf.plot_covariate_groups('prio', values=np.arange(0, 15), cmap='coolwarm')

>>> # multiple variables at once
>>> wf.plot_covariate_groups(['prio', 'paro'], values=[[0, 0], [5, 0], [10, 0], [0, 1], [5, 1], [10, 1]], cmap='coolwarm')

>>> # if you have categorical variables, you can simply things:
>>> wf.plot_covariate_groups(['dummy1', 'dummy2', 'dummy3'], values=np.eye(3))

predict_cumulative_hazard(X, times=None, ancillary_X=None)

Return the cumulative hazard rate of subjects in X at time points.

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index. ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. cumulative_hazard_ – the cumulative hazard of individuals over the timeline DataFrame
predict_expectation(X, ancillary_X=None)

Predict the expectation of lifetimes, $$E[T | x]$$.

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. percentiles – the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity. DataFrame
predict_median(X, ancillary_X=None)

Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. percentiles – the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity. DataFrame
predict_percentile(X, ancillary_X=None, p=0.5)

Returns the median lifetimes for the individuals, by default. If the survival curve of an individual does not cross 0.5, then the result is infinity. http://stats.stackexchange.com/questions/102986/percentile-loss-functions

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. p (float, optional (default=0.5)) – the percentile, must be between 0 and 1. percentiles DataFrame
predict_survival_function(X, times=None, ancillary_X=None)

Predict the survival function for individuals, given their covariates. This assumes that the individual just entered the study (that is, we do not condition on how long they have already lived for.)

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index. survival_function – the survival probabilities of individuals over the timeline DataFrame
print_summary(decimals=2, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters: decimals (int, optional (default=2)) – specify the number of decimal places to show alpha (float or iterable) – specify confidence intervals to show kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
score_

The concordance score (also known as the c-index) of the fit. The c-index is a generalization of the ROC AUC to survival data, including censorships.

For this purpose, the score_ is a measure of the predictive accuracy of the fitted model onto the training dataset.

summary

Summary statistics describing the fit.

Returns: df – Contains columns coef, np.exp(coef), se(coef), z, p, lower, upper DataFrame

## lifelines.fitters.log_normal_aft_fitter module¶

class lifelines.fitters.log_normal_aft_fitter.LogNormalAFTFitter(alpha=0.05, penalizer=0.0, l1_ratio=0.0, fit_intercept=True)

Bases: lifelines.fitters.ParametericRegressionFitter

This class implements a Log-Normal AFT model. The model has parameterized form, with $$\mu(x) = \exp\left(a_0 + a_1x_1 + ... + a_n x_n \right)$$, and optionally, $$\sigma(y) = \exp\left(b_0 + b_1 y_1 + ... + b_m y_m \right)$$,

The cumulative hazard rate is

$H(t; x, y) = -\log(1 - \Phi\left(\frac{\log(T) - \mu(x)}{\sigma(y)}\right))$

After calling the .fit method, you have access to properties like: params_, print_summary(). A summary of the fit is available with the method print_summary().

Parameters: alpha (float, optional (default=0.05)) – the level in the confidence intervals. fit_intercept (boolean, optional (default=True)) – Allow lifelines to add an intercept column of 1s to df, and ancillary_df if applicable. penalizer (float, optional (default=0.0)) – the penalizer coefficient to the size of the coefficients. See l1_ratio. Must be equal to or greater than 0. l1_ratio (float, optional (default=0.0)) – how much of the penalizer should be attributed to an l1 penalty (otherwise an l2 penalty). The penalty function looks like penalizer * l1_ratio * ||w||_1 + 0.5 * penalizer * (1 - l1_ratio) * ||w||^2_2
params_

The estimated coefficients

Type: DataFrame
confidence_intervals_

The lower and upper confidence intervals for the coefficients

Type: DataFrame
durations

The event_observed variable provided

Type: Series
event_observed

The event_observed variable provided

Type: Series
weights

The event_observed variable provided

Type: Series
variance_matrix_

The variance matrix of the coefficients

Type: numpy array
standard_errors_

the standard errors of the estimates

Type: Series
score_

the concordance index of the model.

Type: float
fit(df, duration_col=None, event_col=None, ancillary_df=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None)

Fit the accelerated failure time model to a dataset.

Parameters: df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored). duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes. event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored. show_progress (boolean, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing. ancillary_df (None, boolean, or DataFrame, optional (default=None)) – Choose to model the ancillary parameters. If None or False, explicitly do not fit the ancillary parameters using any covariates. If True, model the ancillary parameters with the same covariates as df. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count as df. timeline (array, optional) – Specify a timeline that will be used for plotting and prediction weights_col (string) – the column in df that specifies weights per observation. robust (boolean, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator. initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector. self with additional new properties: print_summary, params_, confidence_intervals_ and more self

Examples

>>> from lifelines import WeibullAFTFitter
>>>
>>> df = pd.DataFrame({
>>>     'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>>     'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
>>>     'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
>>>     'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>> })
>>>
>>> aft = WeibullAFTFitter()
>>> aft.fit(df, 'T', 'E')
>>> aft.print_summary()
>>> aft.predict_median(df)
>>>
>>> aft = WeibullAFTFitter()
>>> aft.fit(df, 'T', 'E', ancillary_df=df)
>>> aft.print_summary()
>>> aft.predict_median(df)

mean_survival_time_
median_survival_time_
plot(columns=None, parameter=None, **errorbar_kwargs)

Produces a visual representation of the coefficients, including their standard errors and magnitudes.

Parameters: columns (list, optional) – specify a subset of the columns to plot errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command ax – the matplotlib axis that be edited. matplotlib axis
plot_covariate_groups(covariates, values, plot_baseline=True, **kwargs)

Produces a visual representation comparing the baseline survival curve of the model versus what happens when a covariate(s) is varied over values in a group. This is useful to compare subjects’ survival as we vary covariate(s), all else being held equal. The baseline survival curve is equal to the predicted survival curve at all average values in the original dataset.

Parameters: covariates (string or list) – a string (or list of strings) of the covariate in the original dataset that we wish to vary. values (1d or 2d iterable) – an iterable of the values we wish the covariate to take on. plot_baseline (bool) – also display the baseline survival, defined as the survival at the mean of the original dataset. kwargs – pass in additional plotting commands ax – the matplotlib axis that be edited. matplotlib axis, or list of axis’

Examples

>>> from lifelines import datasets, WeibullAFTFitter
>>> wf = WeibullAFTFitter().fit(rossi, 'week', 'arrest')
>>> wf.plot_covariate_groups('prio', values=np.arange(0, 15), cmap='coolwarm')

>>> # multiple variables at once
>>> wf.plot_covariate_groups(['prio', 'paro'], values=[[0, 0], [5, 0], [10, 0], [0, 1], [5, 1], [10, 1]], cmap='coolwarm')

>>> # if you have categorical variables, you can simply things:
>>> wf.plot_covariate_groups(['dummy1', 'dummy2', 'dummy3'], values=np.eye(3))

predict_cumulative_hazard(X, times=None, ancillary_X=None)

Return the cumulative hazard rate of subjects in X at time points.

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index. ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. cumulative_hazard_ – the cumulative hazard of individuals over the timeline DataFrame
predict_expectation(X, ancillary_X=None)

Predict the expectation of lifetimes, $$E[T | x]$$.

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. percentiles – the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity. DataFrame
predict_median(X, ancillary_X=None)

Returns the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity. http://stats.stackexchange.com/questions/102986/percentile-loss-functions

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. p (float, optional (default=0.5)) – the percentile, must be between 0 and 1. DataFrame
predict_percentile(X, ancillary_X=None, p=0.5)

Returns the median lifetimes for the individuals, by default. If the survival curve of an individual does not cross p, then the result is infinity. http://stats.stackexchange.com/questions/102986/percentile-loss-functions

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. p (float, optional (default=0.5)) – the percentile, must be between 0 and 1. percentiles DataFrame
predict_survival_function(X, times=None, ancillary_X=None)

Predict the survival function for individuals, given their covariates. This assumes that the individual just entered the study (that is, we do not condition on how long they have already lived for.)

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index. survival_function – the survival probabilities of individuals over the timeline DataFrame
print_summary(decimals=2, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters: decimals (int, optional (default=2)) – specify the number of decimal places to show alpha (float or iterable) – specify confidence intervals to show kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
score_

The concordance score (also known as the c-index) of the fit. The c-index is a generalization of the ROC AUC to survival data, including censorships.

For this purpose, the score_ is a measure of the predictive accuracy of the fitted model onto the training dataset.

summary

Summary statistics describing the fit.

Returns: df – Contains columns coef, np.exp(coef), se(coef), z, p, lower, upper DataFrame

## lifelines.fitters.log_logistic_aft_fitter module¶

class lifelines.fitters.log_logistic_aft_fitter.LogLogisticAFTFitter(alpha=0.05, penalizer=0.0, l1_ratio=0.0, fit_intercept=True)

Bases: lifelines.fitters.ParametericRegressionFitter

This class implements a Log-Logistic AFT model. The model has parameterized form, with $$\alpha(x) = \exp\left(a_0 + a_1x_1 + ... + a_n x_n \right)$$, and optionally, $$\beta(y) = \exp\left(b_0 + b_1 y_1 + ... + b_m y_m \right)$$,

The cumulative hazard rate is

$H(t; x , y) = \log\left(1 + \left(\frac{t}{\alpha(x)}\right)^ \beta(y)\right)$

After calling the .fit method, you have access to properties like: params_, print_summary(). A summary of the fit is available with the method print_summary().

Parameters: alpha (float, optional (default=0.05)) – the level in the confidence intervals. fit_intercept (boolean, optional (default=True)) – Allow lifelines to add an intercept column of 1s to df, and ancillary_df if applicable. penalizer (float, optional (default=0.0)) – the penalizer coefficient to the size of the coefficients. See l1_ratio. Must be equal to or greater than 0. l1_ratio (float, optional (default=0.0)) – how much of the penalizer should be attributed to an l1 penalty (otherwise an l2 penalty). The penalty function looks like penalizer * l1_ratio * ||w||_1 + 0.5 * penalizer * (1 - l1_ratio) * ||w||^2_2
params_

The estimated coefficients

Type: DataFrame
confidence_intervals_

The lower and upper confidence intervals for the coefficients

Type: DataFrame
durations

The event_observed variable provided

Type: Series
event_observed

The event_observed variable provided

Type: Series
weights

The event_observed variable provided

Type: Series
variance_matrix_

The variance matrix of the coefficients

Type: numpy array
standard_errors_

the standard errors of the estimates

Type: Series
score_

the concordance index of the model.

Type: float
fit(df, duration_col=None, event_col=None, ancillary_df=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None)

Fit the accelerated failure time model to a dataset.

Parameters: df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored). duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes. event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored. show_progress (boolean, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing. ancillary_df (None, boolean, or DataFrame, optional (default=None)) – Choose to model the ancillary parameters. If None or False, explicitly do not fit the ancillary parameters using any covariates. If True, model the ancillary parameters with the same covariates as df. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count as df. timeline (array, optional) – Specify a timeline that will be used for plotting and prediction weights_col (string) – the column in df that specifies weights per observation. robust (boolean, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator. initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector. self with additional new properties: print_summary, params_, confidence_intervals_ and more self

Examples

>>> from lifelines import WeibullAFTFitter
>>>
>>> df = pd.DataFrame({
>>>     'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>>     'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
>>>     'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
>>>     'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>> })
>>>
>>> aft = WeibullAFTFitter()
>>> aft.fit(df, 'T', 'E')
>>> aft.print_summary()
>>> aft.predict_median(df)
>>>
>>> aft = WeibullAFTFitter()
>>> aft.fit(df, 'T', 'E', ancillary_df=df)
>>> aft.print_summary()
>>> aft.predict_median(df)

mean_survival_time_
median_survival_time_
plot(columns=None, parameter=None, **errorbar_kwargs)

Produces a visual representation of the coefficients, including their standard errors and magnitudes.

Parameters: columns (list, optional) – specify a subset of the columns to plot errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command ax – the matplotlib axis that be edited. matplotlib axis
plot_covariate_groups(covariates, values, plot_baseline=True, **kwargs)

Produces a visual representation comparing the baseline survival curve of the model versus what happens when a covariate(s) is varied over values in a group. This is useful to compare subjects’ survival as we vary covariate(s), all else being held equal. The baseline survival curve is equal to the predicted survival curve at all average values in the original dataset.

Parameters: covariates (string or list) – a string (or list of strings) of the covariate in the original dataset that we wish to vary. values (1d or 2d iterable) – an iterable of the values we wish the covariate to take on. plot_baseline (bool) – also display the baseline survival, defined as the survival at the mean of the original dataset. kwargs – pass in additional plotting commands ax – the matplotlib axis that be edited. matplotlib axis, or list of axis’

Examples

>>> from lifelines import datasets, WeibullAFTFitter
>>> wf = WeibullAFTFitter().fit(rossi, 'week', 'arrest')
>>> wf.plot_covariate_groups('prio', values=np.arange(0, 15), cmap='coolwarm')

>>> # multiple variables at once
>>> wf.plot_covariate_groups(['prio', 'paro'], values=[[0, 0], [5, 0], [10, 0], [0, 1], [5, 1], [10, 1]], cmap='coolwarm')

>>> # if you have categorical variables, you can simply things:
>>> wf.plot_covariate_groups(['dummy1', 'dummy2', 'dummy3'], values=np.eye(3))

predict_cumulative_hazard(X, times=None, ancillary_X=None)

Return the cumulative hazard rate of subjects in X at time points.

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index. ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. cumulative_hazard_ – the cumulative hazard of individuals over the timeline DataFrame
predict_expectation(X, ancillary_X=None)

Predict the expectation of lifetimes, $$E[T | x]$$.

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. percentiles – the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity. DataFrame
predict_median(X, ancillary_X=None)

Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. percentiles – the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity. DataFrame
predict_percentile(X, ancillary_X=None, p=0.5)

Returns the median lifetimes for the individuals, by default. If the survival curve of an individual does not cross p, then the result is infinity. http://stats.stackexchange.com/questions/102986/percentile-loss-functions

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. p (float, optional (default=0.5)) – the percentile, must be between 0 and 1. percentiles DataFrame
predict_survival_function(X, times=None, ancillary_X=None)

Predict the survival function for individuals, given their covariates. This assumes that the individual just entered the study (that is, we do not condition on how long they have already lived for.)

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index. survival_function – the survival probabilities of individuals over the timeline DataFrame
print_summary(decimals=2, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters: decimals (int, optional (default=2)) – specify the number of decimal places to show alpha (float or iterable) – specify confidence intervals to show kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
score_

The concordance score (also known as the c-index) of the fit. The c-index is a generalization of the ROC AUC to survival data, including censorships.

For this purpose, the score_ is a measure of the predictive accuracy of the fitted model onto the training dataset.

summary

Summary statistics describing the fit.

Returns: df – Contains columns coef, np.exp(coef), se(coef), z, p, lower, upper DataFrame