lifelines.fitters package

Submodules

lifelines.fitters.aalen_additive_fitter module

class lifelines.fitters.aalen_additive_fitter.AalenAdditiveFitter(fit_intercept=True, alpha=0.95, coef_penalizer=0.0, smoothing_penalizer=0.0)

Bases: lifelines.fitters.BaseFitter

This class fits the regression model:

\[h(t|x) = b_0(t) + b_1(t) x_1 + ... + b_N(t) x_N\]

that is, the hazard rate is a linear function of the covariates with time-varying coefficients. This implementation assumes non-time-varying covariates, see TODO: name

Note

This class was rewritten in lifelines 0.17.0 to focus solely on static datasets. There is no guarantee of backwards compatibility.

Parameters:
  • fit_intercept (bool, optional (default: True)) – If False, do not attach an intercept (column of ones) to the covariate matrix. The intercept, \(b_0(t)\) acts as a baseline hazard.
  • alpha (float) – the level in the confidence intervals.
  • coef_penalizer (float, optional (default: 0)) – Attach a L2 penalizer to the size of the coeffcients during regression. This improves stability of the estimates and controls for high correlation between covariates. For example, this shrinks the absolute value of \(c_{i,t}\).
  • smoothing_penalizer (float, optional (default: 0)) – Attach a L2 penalizer to difference between adjacent (over time) coefficents. For example, this shrinks the absolute value of \(c_{i,t} - c_{i,t+1}\).
fit(df, duration_col, event_col=None, weights_col=None, show_progress=False)
Parameters:

Fit the Aalen Additive model to a dataset.

Parameters:
  • df (DataFrame) – a Pandas dataframe with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
  • duration_col (string) – the name of the column in dataframe that contains the subjects’ lifetimes.
  • event_col (string, optional) – the name of thecolumn in dataframe that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
  • weights_col (string, optional) – an optional column in the dataframe, df, that denotes the weight per subject. This column is expelled and not used as a covariate, but as a weight in the final regression. Default weight is 1. This can be used for case-weights. For example, a weight of 2 means there were two subjects with identical observations. This can be used for sampling weights.
  • show_progress (boolean, optional (default=False)) – Since the fitter is iterative, show iteration number.
Returns:

self – self with additional new properties: cumulative_hazards_, etc.

Return type:

AalenAdditiveFitter

Examples

>>> from lifelines import AalenAdditiveFitter
>>>
>>> df = pd.DataFrame({
>>>     'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>>     'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
>>>     'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
>>>     'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>> })
>>>
>>> aaf = AalenAdditiveFitter()
>>> aaf.fit(df, 'T', 'E')
>>> aaf.predict_median(df)
>>> aaf.print_summary()
plot(columns=None, loc=None, iloc=None, **kwargs)

” A wrapper around plotting. Matplotlib plot arguments can be passed in, plus:

Parameters:
  • columns (string or list-like, optional) – If not empty, plot a subset of columns from the cumulative_hazards_. Default all.

  • ix (slice, optional) –

    specify a time-based subsection of the curves to plot, ex:

    .plot(loc=slice(0.,10.)) will plot the time values between t=0. and t=10.

  • iloc (slice, optional) –

    specify a location-based subsection of the curves to plot, ex:

    .plot(iloc=slice(0,10)) will plot the first 10 time points.

predict_cumulative_hazard(X)

Returns the hazard rates for the individuals

Parameters:X (a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns) – can be in any order. If a numpy array, columns must be in the same order as the training data.
predict_expectation(X)

Compute the expected lifetime, E[T], using covariates X.

Parameters:
  • X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
  • Returns the expected lifetimes for the individuals
predict_median(X)
Parameters:
  • X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
  • Returns the median lifetimes for the individuals
predict_percentile(X, p=0.5)

Returns the median lifetimes for the individuals. http://stats.stackexchange.com/questions/102986/percentile-loss-functions

Parameters:X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
predict_survival_function(X)

Returns the survival functions for the individuals

Parameters:X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
print_summary(decimals=2, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters:
  • decimals (int, optional (default=2)) – specify the number of decimal places to show
  • kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
score_

The concordance score (also known as the c-index) of the fit. The c-index is a generalization of the AUC to survival data, including censorships.

For this purpose, the score_ is a measure of the predictive accuracy of the fitted model onto the training dataset. It’s analgous to the R^2 in linear models.

smoothed_hazards_(bandwidth=1)

Using the epanechnikov kernel to smooth the hazard function, with sigma/bandwidth

summary

Summary statistics describing the fit. Set alpha property in the object before calling.

Returns:df
Return type:DataFrame

lifelines.fitters.aalen_johansen_fitter module

class lifelines.fitters.aalen_johansen_fitter.AalenJohansenFitter(jitter_level=0.0001, seed=None, alpha=0.95)

Bases: lifelines.fitters.UnivariateFitter

Class for fitting the Aalen-Johansen estimate for the cumulative incidence function in a competing risks framework. Treating competing risks as censoring can result in over-estimated cumulative density functions. Using the Kaplan Meier estimator with competing risks as censored is akin to estimating the cumulative density if all competing risks had been prevented. If you are interested in learning more, I (Paul Zivich) recommend the following open-access paper; Edwards JK, Hester LL, Gokhale M, Lesko CR. Methodologic Issues When Estimating Risks in Pharmacoepidemiology. Curr Epidemiol Rep. 2016;3(4):285-296.

AalenJohansenFitter(alpha=0.95, jitter_level=0.00001, seed=None)

Aalen-Johansen cannot deal with tied times. We can get around this by randomy jittering the event times slightly. This will be done automatically and generates a warning.

fit(durations, event_observed, event_of_interest, timeline=None, entry=None, label='AJ_estimate', alpha=None, ci_labels=None, weights=None)
Parameters:
  • durations (an array or pd.Series of length n – duration of subject was observed for)
  • event_observed (an array, or pd.Series, of length n. Integer indicator of distinct events. Must be) – only positive integers, where 0 indicates censoring.
  • event_of_interest (integer – indicator for event of interest. All other integers are considered competing events) – Ex) event_observed contains 0, 1, 2 where 0:censored, 1:lung cancer, and 2:death. If event_of_interest=1, then death (2) is considered a competing event. The returned cumulative incidence function corresponds to risk of lung cancer
  • timeline (return the best estimate at the values in timelines (postively increasing))
  • entry (an array, or pd.Series, of length n – relative time when a subject entered the study. This is) – useful for left-truncated (not left-censored) observations. If None, all members of the population were born at time 0.
  • label (a string to name the column of the estimate.)
  • alpha (the alpha value in the confidence intervals. Overrides the initializing) – alpha for this call to fit only.
  • ci_labels (add custom column names to the generated confidence intervals) – as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<alpha>
  • weights (n array, or pd.Series, of length n, if providing a weighted dataset. For example, instead) – of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
Returns:

self – self, with new properties like ‘cumulative_incidence_’.

Return type:

AalenJohansenFitter

lifelines.fitters.breslow_fleming_harrington_fitter module

class lifelines.fitters.breslow_fleming_harrington_fitter.BreslowFlemingHarringtonFitter(alpha=0.95)

Bases: lifelines.fitters.UnivariateFitter

Class for fitting the Breslow-Fleming-Harrington estimate for the survival function. This estimator is a biased estimator of the survival function but is more stable when the popualtion is small and there are too few early truncation times, it may happen that is the number of patients at risk and the number of deaths is the same.

Mathematically, the NAF estimator is the negative logarithm of the BFH estimator.

BreslowFlemingHarringtonFitter(alpha=0.95)

Parameters:alpha (float) – The alpha value associated with the confidence intervals.
fit(durations, event_observed=None, timeline=None, entry=None, label='BFH_estimate', alpha=None, ci_labels=None)
Parameters:
  • duration (an array, or pd.Series, of length n) – duration subject was observed for

  • timeline – return the best estimate at the values in timelines (postively increasing)

  • event_observed (an array, or pd.Series, of length n) – True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None

  • entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for left-truncated observations, i.e the birth event was not observed. If None, defaults to all 0 (all birth events observed.)

  • label (string) – a string to name the column of the estimate.

  • alpha (float) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.

  • ci_labels (iterable) –

    add custom column names to the generated confidence intervals

    as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<alpha>

Returns:

Return type:

self, with new properties like ‘survival_function_’.

lifelines.fitters.cox_time_varying_fitter module

class lifelines.fitters.cox_time_varying_fitter.CoxTimeVaryingFitter(alpha=0.95, penalizer=0.0, strata=None)

Bases: lifelines.fitters.BaseFitter

This class implements fitting Cox’s time-varying proportional hazard model:

\[h(t|x(t)) = h_0(t)*exp(x(t)'*beta)\]
Parameters:
  • alpha (float, optional) – the level in the confidence intervals.
  • penalizer (float, optional) – the coefficient of an l2 penalizer in the regression
fit(df, id_col, event_col, start_col='start', stop_col='stop', weights_col=None, show_progress=False, step_size=None, robust=False, strata=None)

Fit the Cox Propertional Hazard model to a time varying dataset. Tied survival times are handled using Efron’s tie-method.

Parameters:
  • df (DataFrame) – a Pandas dataframe with necessary columns duration_col and event_col, plus other covariates. duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
  • id_col (string) – A subject could have multiple rows in the dataframe. This column contains the unique identifer per subject.
  • event_col (string) – the column in dataframe that contains the subjects’ death observation. If left as None, assume all individuals are non-censored.
  • start_col (string) – the column that contains the start of a subject’s time period.
  • stop_col (string) – the column that contains the end of a subject’s time period.
  • weights_col (string, optional) – the column that contains (possibly time-varying) weight of each subject-period row.
  • show_progress (since the fitter is iterative, show convergence) – diagnostics.
  • robust (boolean, optional (default: True)) – Compute the robust errors using the Huber sandwich estimator, aka Wei-Lin estimate. This does not handle ties, so if there are high number of ties, results may significantly differ. See “The Robust Inference for the Cox Proportional Hazards Model”, Journal of the American Statistical Association, Vol. 84, No. 408 (Dec., 1989), pp. 1074- 1078
  • step_size (float, optional) – set an initial step size for the fitting algorithm.
  • strata (TODO)
Returns:

self – self, with additional properties like hazards_ and print_summary

Return type:

CoxTimeVaryingFitter

plot(columns=None, display_significance_code=True, **errorbar_kwargs)

Produces a visual representation of the coefficients, including their standard errors and magnitudes.

Parameters:
  • columns (list, optional) – specifiy a subset of the columns to plot
  • display_significance_code (bool, optional (default: True)) – display asteriks beside statistically significant variables
  • errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command
Returns:

ax – the matplotlib axis that be edited.

Return type:

matplotlib axis

predict_log_partial_hazard(X)

This is equivalent to R’s linear.predictors. Returns the log of the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to \(\beta (X - \bar{X})\)

Parameters:X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
Returns:
Return type:DataFrame

Note

If X is a dataframe, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

predict_partial_hazard(X)

Returns the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to \(\exp{\beta (X - \bar{X})}\)

Parameters:X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
Returns:
Return type:DataFrame

Note

If X is a dataframe, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

print_summary(decimals=2, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters:
  • decimals (int, optional (default=2)) – specify the number of decimal places to show
  • kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
summary

Summary statistics describing the fit. Set alpha property in the object before calling.

Returns:df – contains columns coef, exp(coef), se(coef), z, p, lower, upper
Return type:DataFrame

lifelines.fitters.coxph_fitter module

class lifelines.fitters.coxph_fitter.BatchVsSingle

Bases: object

static decide(batch_mode, T)
class lifelines.fitters.coxph_fitter.CoxPHFitter(alpha=0.95, tie_method='Efron', penalizer=0.0, strata=None)

Bases: lifelines.fitters.BaseFitter

This class implements fitting Cox’s proportional hazard model:

\[h(t|x) = h_0(t) \exp(x \beta)\]
Parameters:
  • alpha (float, optional (default=0.95)) – the level in the confidence intervals.

  • tie_method (string, optional) – specify how the fitter should deal with ties. Currently only ‘Efron’ is available.

  • penalizer (float, optional (default=0.0)) – Attach a L2 penalizer to the size of the coeffcients during regression. This improves stability of the estimates and controls for high correlation between covariates. For example, this shrinks the absolute value of \(\beta_i\). Recommended, even if a small value. The penalty is \(1/2 \text{penalizer} ||beta||^2\).

  • strata (list, optional) –

    specify a list of columns to use in stratification. This is useful if a

    catagorical covariate does not obey the proportional hazard assumption. This is used similar to the strata expression in R. See http://courses.washington.edu/b515/l17.pdf.

Examples

>>> from lifelines.datasets import load_rossi
>>> from lifelines import CoxPHFitter
>>> rossi = load_rossi()
>>> cph = CoxPHFitter()
>>> cph.fit(rossi, 'week', 'arrest')
>>> cph.print_summary()
check_assumptions(training_df, advice=True, show_plots=True, p_value_threshold=0.05, plot_n_bootstraps=10)

section 5 in https://socialsciences.mcmaster.ca/jfox/Books/Companion/appendices/Appendix-Cox-Regression.pdf http://www.mwsug.org/proceedings/2006/stats/MWSUG-2006-SD08.pdf http://eprints.lse.ac.uk/84988/1/06_ParkHendry2015-ReassessingSchoenfeldTests_Final.pdf

compute_residuals(training_dataframe, kind)
Parameters:
  • training_dataframe (pandas DataFrame) – the same training dataframe given in fit
  • kind (string) – {‘schoenfeld’, ‘score’, ‘delta_beta’, ‘deviance’, ‘martingale’}
fit(df, duration_col=None, event_col=None, show_progress=False, initial_beta=None, strata=None, step_size=None, weights_col=None, cluster_col=None, robust=False, batch_mode=None)

Fit the Cox Propertional Hazard model to a dataset.

Parameters:
  • df (DataFrame) – a Pandas dataframe with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights, strata). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
  • duration_col (string) – the name of the column in dataframe that contains the subjects’ lifetimes.
  • event_col (string, optional) – the name of thecolumn in dataframe that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
  • weights_col (string, optional) – an optional column in the dataframe, df, that denotes the weight per subject. This column is expelled and not used as a covariate, but as a weight in the final regression. Default weight is 1. This can be used for case-weights. For example, a weight of 2 means there were two subjects with identical observations. This can be used for sampling weights. In that case, use robust=True to get more accurate standard errors.
  • show_progress (boolean, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
  • initial_beta (numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
  • strata (list or string, optional) – specify a column or list of columns n to use in stratification. This is useful if a catagorical covariate does not obey the proportional hazard assumption. This is used similar to the strata expression in R. See http://courses.washington.edu/b515/l17.pdf.
  • step_size (float, optional) – set an initial step size for the fitting algorithm.
  • robust (boolean, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator, aka Wei-Lin estimate. This does not handle ties, so if there are high number of ties, results may significantly differ. See “The Robust Inference for the Cox Proportional Hazards Model”, Journal of the American Statistical Association, Vol. 84, No. 408 (Dec., 1989), pp. 1074- 1078
  • cluster_col (string, optional) – specifies what column has unique identifers for clustering covariances. Using this forces the sandwich estimator (robust variance estimator) to be used.
  • batch_mode (bool, optional) – enabling batch_mode can be faster for datasets with a large number of ties. If left as None, lifelines will choose the best option.
Returns:

self – self with additional new properties: print_summary, hazards_, confidence_intervals_, baseline_survival_, etc.

Return type:

CoxPHFitter

Note

Tied survival times are handled using Efron’s tie-method.

Examples

>>> from lifelines import CoxPHFitter
>>>
>>> df = pd.DataFrame({
>>>     'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>>     'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
>>>     'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
>>>     'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>> })
>>>
>>> cph = CoxPHFitter()
>>> cph.fit(df, 'T', 'E')
>>> cph.print_summary()
>>> cph.predict_median(df)
>>> from lifelines import CoxPHFitter
>>>
>>> df = pd.DataFrame({
>>>     'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>>     'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
>>>     'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
>>>     'weights': [1.1, 0.5, 2.0, 1.6, 1.2, 4.3, 1.4, 4.5, 3.0, 3.2, 0.4, 6.2],
>>>     'month': [10, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>>     'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>> })
>>>
>>> cph = CoxPHFitter()
>>> cph.fit(df, 'T', 'E', strata=['month', 'age'], robust=True, weights_col='weights')
>>> cph.print_summary()
>>> cph.predict_median(df)
plot(columns=None, display_significance_code=True, **errorbar_kwargs)

Produces a visual representation of the coefficients, including their standard errors and magnitudes.

Parameters:
  • columns (list, optional) – specify a subset of the columns to plot
  • display_significance_code (bool, optional (default: True)) – display asteriks beside statistically significant variables
  • errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command
Returns:

ax – the matplotlib axis that be edited.

Return type:

matplotlib axis

plot_covariate_groups(covariate, groups, **kwargs)

Produces a visual representation comparing the baseline survival curve of the model versus what happens when a covariate is varied over values in a group. This is useful to compare subjects’ survival as we vary a single covariate, all else being held equal. The baseline survival curve is equal to the predicted survival curve at all average values in the original dataset.

Parameters:
  • covariate (string) – a string of the covariate in the original dataset that we wish to vary.
  • groups (iterable) – an iterable of the values we wish the covariate to take on.
  • kwargs – pass in additional plotting commands
Returns:

ax – the matplotlib axis that be edited.

Return type:

matplotlib axis

predict_cumulative_hazard(X, times=None)
Parameters:
  • X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
  • times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index.
Returns:

cumulative_hazard_ – the cumulative hazard of individuals over the timeline

Return type:

DataFrame

predict_expectation(X)

Compute the expected lifetime, \(E[T]\), using covarites X. This algorithm to compute the expection is to use the fact that \(E[T] = \int_0^\inf P(T > t) dt = \int_0^\inf S(t) dt\). To compute the integal, we use the trapizoidal rule to approximate the integral.

However, if the survival function doesn’t converge to 0, the the expectation is really infinity and the returned values are meaningless/too large. In that case, using predict_median or predict_percentile would be better.

Parameters:X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
Returns:expectations
Return type:DataFrame

Notes

If X is a dataframe, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

predict_log_partial_hazard(X)

This is equivalent to R’s linear.predictors. Returns the log of the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to \(\beta (X - mean(X_{train}))\)

Parameters:X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
Returns:log_partial_hazard
Return type:DataFrame

Notes

If X is a dataframe, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

predict_median(X)

Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.

Parameters:X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
Returns:percentiles – the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Return type:DataFrame
predict_partial_hazard(X)
Parameters:X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
Returns:partial_hazard – Returns the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to \(\exp{\beta (X - mean(X_{train}))}\)
Return type:DataFrame

Notes

If X is a dataframe, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

predict_percentile(X, p=0.5)

Returns the median lifetimes for the individuals, by default. If the survival curve of an individual does not cross 0.5, then the result is infinity. http://stats.stackexchange.com/questions/102986/percentile-loss-functions

Parameters:
  • X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
  • p (float, optional (default=0.5)) – the percentile, must be between 0 and 1.
Returns:

percentiles

Return type:

DataFrame

See also

predict_median()

predict_survival_function(X, times=None)

Predict the survival function for individuals, given their covariates. This assumes that the individual just entered the study (that is, we do not condition on how long they have already lived for.)

Parameters:
  • X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
  • times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index.
Returns:

survival_function – the survival probabilities of individuals over the timeline

Return type:

DataFrame

print_summary(decimals=2, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters:
  • decimals (int, optional (default=2)) – specify the number of decimal places to show
  • kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
score_

The concordance score (also known as the c-index) of the fit. The c-index is a generalization of the AUC to survival data, including censorships.

For this purpose, the score_ is a measure of the predictive accuracy of the fitted model onto the training dataset. It’s analgous to the R^2 in linear models.

summary

Summary statistics describing the fit. Set alpha property in the object before calling.

Returns:df – Contains columns coef, exp(coef), se(coef), z, p, lower, upper
Return type:DataFrame

lifelines.fitters.exponential_fitter module

class lifelines.fitters.exponential_fitter.ExponentialFitter(alpha=0.95)

Bases: lifelines.fitters.UnivariateFitter

This class implements an Exponential model for univariate data. The model has parameterized form:

\[S(t) = exp(-(\lambda*t)), \lambda >0\]

which implies the cumulative hazard rate is

\[H(t) = \lambda*t\]

and the hazard rate is:

\[h(t) = \lambda\]
After calling the .fit method, you have access to properties like:
survival_function_’, ‘lambda_

A summary of the fit is available with the method ‘print_summary()’

Notes

Reference: https://www4.stat.ncsu.edu/~dzhang2/st745/chap3.pdf

fit(durations, event_observed=None, timeline=None, label='Exponential_estimate', alpha=None, ci_labels=None)
Parameters:
  • duration (iterable) – an array, or pd.Series, of length n – duration subject was observed for

  • event_observed (iterable, optional) –

    an array, list, or pd.Series, of length n – True if the the death was observed, False if the event

    was lost (right-censored). Defaults all True if event_observed==None

  • timeline (iterable, optional) – return the best estimate at the values in timelines (postively increasing)

  • label (string, optional) – a string to name the column of the estimate.

  • alpha (float, optional) –

    the alpha value in the confidence intervals. Overrides the initializing

    alpha for this call to fit only.

  • ci_labels (list, optional) –

    add custom column names to the generated confidence intervals

    as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<alpha>

Returns:

self – self, with new properties like ‘survival_function_’ and ‘lambda_’.

Return type:

ExpontentialFitter

print_summary(decimals=2, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters:
  • decimals (int, optional (default=2)) – specify the number of decimal places to show
  • kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
summary

Summary statistics describing the fit. Set alpha property in the object before calling.

Returns:df – Contains columns coef, exp(coef), se(coef), z, p, lower, upper
Return type:DataFrame

lifelines.fitters.kaplan_meier_fitter module

class lifelines.fitters.kaplan_meier_fitter.KaplanMeierFitter(alpha=0.95)

Bases: lifelines.fitters.UnivariateFitter

Class for fitting the Kaplan-Meier estimate for the survival function.

Parameters:alpha (float, option (default=0.95)) – The alpha value associated with the confidence intervals.

Examples

>>> from lifelines import KaplanMeierFitter
>>> from lifelines.datasets import load_waltons
>>> waltons = load_waltons()
>>> kmf = KaplanMeierFitter()
>>> kmf.fit(waltons['T'], waltons['E'])
>>> kmf.plot()
fit(durations, event_observed=None, timeline=None, entry=None, label='KM_estimate', alpha=None, left_censorship=False, ci_labels=None, weights=None)
Parameters:
  • duration (an array, or pd.Series, of length n – duration subject was observed for)
  • timeline (return the best estimate at the values in timelines (postively increasing))
  • event_observed (an array, or pd.Series, of length n – True if the the death was observed, False if the event) – was lost (right-censored). Defaults all True if event_observed==None
  • entry (an array, or pd.Series, of length n – relative time when a subject entered the study. This is) – useful for left-truncated (not left-censored) observations. If None, all members of the population entered study when they were “born”.
  • label (a string to name the column of the estimate.)
  • alpha (the alpha value in the confidence intervals. Overrides the initializing) – alpha for this call to fit only.
  • left_censorship (True if durations and event_observed refer to left censorship events. Default False)
  • ci_labels (add custom column names to the generated confidence intervals) – as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<alpha>
  • weights (n array, or pd.Series, of length n, if providing a weighted dataset. For example, instead) – of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
Returns:

self – self with new properties like ‘survival_function_’.

Return type:

KaplanMeierFitter

plot_loglogs(*args, **kwargs)

lifelines.fitters.nelson_aalen_fitter module

class lifelines.fitters.nelson_aalen_fitter.NelsonAalenFitter(alpha=0.95, nelson_aalen_smoothing=True)

Bases: lifelines.fitters.UnivariateFitter

Class for fitting the Nelson-Aalen estimate for the cumulative hazard.

NelsonAalenFitter(alpha=0.95, nelson_aalen_smoothing=True)

alpha: The alpha value associated with the confidence intervals. nelson_aalen_smoothing: If the event times are naturally discrete (like discrete years, minutes, etc.)

then it is advisable to turn this parameter to False. See [1], pg.84.

Notes

[1] Aalen, O., Borgan, O., Gjessing, H., 2008. Survival and Event History Analysis

conditional_time_to_event_
fit(durations, event_observed=None, timeline=None, entry=None, label='NA_estimate', alpha=None, ci_labels=None, weights=None)
Parameters:
  • duration (an array, or pd.Series, of length n) – duration subject was observed for

  • timeline (iterable) – return the best estimate at the values in timelines (postively increasing)

  • event_observed (an array, or pd.Series, of length n) – True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None

  • entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for left-truncated observations, i.e the birth event was not observed. If None, defaults to all 0 (all birth events observed.)

  • label (string) – a string to name the column of the estimate.

  • alpha (float) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.

  • ci_labels (iterable) –

    add custom column names to the generated confidence intervals

    as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<alpha>

  • weights (n array, or pd.Series, of length n) – if providing a weighted dataset. For example, instead of providing every subject as a single element of durations and event_observed, one could weigh subject differently.

Returns:

Return type:

self, with new properties like ‘cumulative_hazard_’.

plot_hazard(*args, **kwargs)
smoothed_hazard_(bandwidth)
Parameters:bandwidth (float) – the bandwith used in the Epanechnikov kernel.
Returns:a DataFrame of the smoothed hazard
Return type:DataFrame
smoothed_hazard_confidence_intervals_(bandwidth, hazard_=None)
Parameters:
  • bandwidth (float) – the bandwith to use in the Epanechnikov kernel. > 0
  • hazard_ (numpy array) – a computed (n,) numpy array of estimated hazard rates. If none, uses naf.smoothed_hazard_

lifelines.fitters.weibull_fitter module

class lifelines.fitters.weibull_fitter.WeibullFitter(alpha=0.95)

Bases: lifelines.fitters.UnivariateFitter

This class implements a Weibull model for univariate data. The model has parameterized form:

\[S(t) = exp(-(\lambda t)^\rho), \lambda > 0, \rho > 0,\]

which implies the cumulative hazard rate is

\[H(t) = (\lambda t)^\rho,\]

and the hazard rate is:

\[h(t) = \rho \lambda(\lambda t)^{\rho-1}\]

After calling the .fit method, you have access to properties like: cumulative_hazard_, survival_function_, lambda_ and rho_.

A summary of the fit is available with the method ‘print_summary()’

Examples

>>> from lifelines import WeibullFitter
>>> from lifelines.datasets import load_waltons
>>> waltons = load_waltons()
>>> wbf = WeibullFitter()
>>> wbf.fit(waltons['T'], waltons['E'])
>>> wbf.plot()
>>> print(wbf.lambda_)
cumulative_hazard_at_times(times)
fit(durations, event_observed=None, timeline=None, label='Weibull_estimate', alpha=None, ci_labels=None, show_progress=False)
Parameters:
  • duration (an array, or pd.Series) – length n, duration subject was observed for

  • event_observed (numpy array or pd.Series, optional) –

    length n, True if the the death was observed, False if the event

    was lost (right-censored). Defaults all True if event_observed==None

  • timeline (list, optional) – return the estimate at the values in timeline (postively increasing)

  • label (string, optional) – a string to name the column of the estimate.

  • alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.

  • ci_labels (list, optional) –

    add custom column names to the generated confidence intervals

    as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<alpha>

  • show_progress (boolean, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.

Returns:

self – self with new properties like cumulative_hazard_, survival_function_, lambda_, and rho_.

Return type:

WeibullFitter

hazard_at_times(times)
print_summary(decimals=2, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters:
  • decimals (int, optional (default=2)) – specify the number of decimal places to show
  • kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
summary

Summary statistics describing the fit. Set alpha property in the object before calling.

Returns:df – Contains columns coef, exp(coef), se(coef), z, p, lower, upper
Return type:pd.DataFrame
survival_function_at_times(times)

Module contents

class lifelines.fitters.BaseFitter(alpha=0.95)

Bases: object

class lifelines.fitters.UnivariateFitter(alpha=0.95)

Bases: lifelines.fitters.BaseFitter

conditional_time_to_event_
divide(other)

Divide the {0} of two {1} objects.

Parameters:other (an {1} fitted instance.)
plot(*args, **kwargs)
predict(times)

Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.

Parameters:time (a scalar or an array of times to predict the value of {0} at.)
Returns:predictions
Return type:a scalar if time is a scalar, a numpy array if time in an array.
subtract(other)

Subtract the {0} of two {1} objects.

Parameters:other (an {1} fitted instance.)