KaplanMeierFitter¶

class lifelines.fitters.kaplan_meier_fitter.KaplanMeierFitter(alpha: float = 0.05, label: str = None)¶

Class for fitting the Kaplan-Meier estimate for the survival function.

Parameters:

alpha (float, optional (default=0.05)) – The alpha value associated with the confidence intervals.
label (string, optional) – Provide a new label for the estimate - useful if looking at many groups.

Examples

from lifelines import KaplanMeierFitter
from lifelines.datasets import load_waltons
waltons = load_waltons()

kmf = KaplanMeierFitter(label="waltons_data")
kmf.fit(waltons['T'], waltons['E'])
kmf.plot()

survival_function_¶

The estimated survival function (with custom timeline if provided)

Type:: DataFrame

median_survival_time_¶

The estimated median time to event. np.inf if doesn’t exist.

Type:: float

confidence_interval_¶

The lower and upper confidence intervals for the survival function. An alias of confidence_interval_survival_function_. Uses Greenwood’s Exponential formula (“log-log” in R).

Type:: DataFrame

confidence_interval_survival_function_¶

The lower and upper confidence intervals for the survival function. An alias of confidence_interval_. Uses Greenwood’s Exponential formula (“log-log” in R).

Type:: DataFrame

cumulative_density_¶

The estimated cumulative density function (with custom timeline if provided)

Type:: DataFrame

confidence_interval_cumulative_density_¶

The lower and upper confidence intervals for the cumulative density.

Type:: DataFrame

durations¶

The durations provided

Type:: array

event_observed¶

The event_observed variable provided

Type:: array

timeline¶

The time line to use for plotting and indexing

Type:: array

entry¶

The entry array provided, or None

Type:: array or None

event_table¶

A summary of the life table

Type:: DataFrame

cumulative_density_at_times(times, label=None) → Series¶

Return a Pandas series of the predicted cumulative density at specific times

Parameters:: times (iterable or float)
Return type:: pd.Series

fit(durations, event_observed=None, timeline=None, entry=None, label=None, alpha=None, ci_labels=None, weights=None, fit_options=None)¶

Fit the model to a right-censored dataset

Parameters:

durations (an array, list, pd.DataFrame or pd.Series) – length n – duration (relative to subject’s birth) the subject was alive for.
event_observed (an array, list, pd.DataFrame, or pd.Series, optional) – True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None
timeline (an array, list, pd.DataFrame, or pd.Series, optional) – return the best estimate at the values in timelines (positively increasing)
entry (an array, list, pd.DataFrame, or pd.Series, optional) – relative time when a subject entered the study. This is useful for left-truncated (not left-censored) observations. If None, all members of the population entered study when they were “born”.
label (string, optional) – a string to name the column of the estimate.
alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
ci_labels (tuple, optional) – add custom column names to the generated confidence intervals as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<1-alpha/2>
weights (an array, list, pd.DataFrame, or pd.Series, optional) – if providing a weighted dataset. For example, instead of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
fit_options – Not used in KaplanMeierFitter

Returns:

self – self with new properties like survival_function_, plot(), median_survival_time_

Return type:

KaplanMeierFitter

fit_interval_censoring(lower_bound, upper_bound, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, entry=None, weights=None, tol: float = 1e-05, show_progress: bool = False, fit_options=None, **kwargs) → KaplanMeierFitter¶

Fit the model to a interval-censored dataset using non-parametric MLE. This estimator is also called the Turnbull Estimator.

Currently, only closed interval are supported. However, it’s easy to create open intervals by adding (or subtracting) a very small value from the lower-bound (or upper bound). For example, the following turns closed intervals into open intervals.

>>> left, right = df['left'], df['right']
>>> KaplanMeierFitter().fit_interval_censoring(left + 0.00001, right - 0.00001)

Note

This is new and experimental, and many features are missing.

Parameters:

lower_bound (an array, list, pd.DataFrame or pd.Series) – length n – lower bound of observations
upper_bound (an array, list, pd.DataFrame or pd.Series) – length n – upper bound of observations
event_observed (an array, list, pd.DataFrame, or pd.Series, optional) – True if the the death was observed, False if the event was lost (right-censored). This can be computed from the lower_bound and upper_bound, and can be left blank.
timeline (an array, list, pd.DataFrame, or pd.Series, optional) – return the best estimate at the values in timelines (positively increasing)
entry (an array, list, pd.DataFrame, or pd.Series, optional) – relative time when a subject entered the study. This is useful for left-truncated (not left-censored) observations. If None, all members of the population entered study when they were “born”.
label (string, optional) – a string to name the column of the estimate.
alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
ci_labels (tuple, optional) – add custom column names to the generated confidence intervals as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<1-alpha/2>
weights (an array, list, pd.DataFrame, or pd.Series, optional) – if providing a weighted dataset. For example, instead of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
tol (float, optional) – minimum difference in log likelihood changes for iterative algorithm.
show_progress (bool, optional) – display information during fitting.

Returns:

self – self with new properties like survival_function_, plot(), median_survival_time_

Return type:

KaplanMeierFitter

fit_left_censoring(durations, event_observed=None, timeline=None, entry=None, label=None, alpha=None, ci_labels=None, weights=None, fit_options=None)¶

Fit the model to a left-censored dataset

Parameters:

durations (an array, list, pd.DataFrame or pd.Series) – length n – duration subject was observed for
event_observed (an array, list, pd.DataFrame, or pd.Series, optional) – True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None
timeline (an array, list, pd.DataFrame, or pd.Series, optional) – return the best estimate at the values in timelines (positively increasing)
entry (an array, list, pd.DataFrame, or pd.Series, optional) – relative time when a subject entered the study. This is useful for left-truncated (not left-censored) observations. If None, all members of the population entered study when they were “born”.
label (string, optional) – a string to name the column of the estimate.
alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
ci_labels (tuple, optional) – add custom column names to the generated confidence intervals as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<1-alpha/2>
weights (an array, list, pd.DataFrame, or pd.Series, optional) – if providing a weighted dataset. For example, instead of providing every subject as a single element of durations and event_observed, one could weigh subject differently.

Returns:

self – self with new properties like survival_function_, plot(), median_survival_time_

Return type:

KaplanMeierFitter

property median_survival_time_: float¶: Return the unique time point, t, such that S(t) = 0.5. This is the “half-life” of the population, and a robust summary statistic for the population, if it exists.

plot(**kwargs)¶

Plots a pretty figure of the model

Matplotlib plot arguments can be passed in inside the kwargs, plus

Parameters:

show_censors (bool) – place markers at censorship events. Default: False
censor_styles (dict) – If show_censors, this dictionary will be passed into the plot call.
ci_alpha (float) – the transparency level of the confidence interval. Default: 0.3
ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False
ci_show (bool) – show confidence intervals. Default: True
ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False
at_risk_counts (bool) – show group sizes at time points. See function add_at_risk_counts for details. Default: False
loc (slice) – specify a time-based subsection of the curves to plot, ex:
```
>>> model.plot(loc=slice(0.,10.))
```
will plot the time values between t=0. and t=10.
iloc (slice) – specify a location-based subsection of the curves to plot, ex:
```
>>> model.plot(iloc=slice(0,10))
```
will plot the first 10 time points.

Returns:

a pyplot axis object

Return type:

ax

plot_cumulative_density(**kwargs)¶

Plots a pretty figure of the cumulative density function.

Matplotlib plot arguments can be passed in inside the kwargs.

Parameters:

show_censors (bool) – place markers at censorship events. Default: False
censor_styles (bool) – If show_censors, this dictionary will be passed into the plot call.
ci_alpha (bool) – the transparency level of the confidence interval. Default: 0.3
ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False
ci_show (bool) – show confidence intervals. Default: True
ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False
at_risk_counts (bool) – show group sizes at time points. See function add_at_risk_counts for details. Default: False
loc (slice) – specify a time-based subsection of the curves to plot, ex:
```
>>> model.plot(loc=slice(0.,10.))
```
will plot the time values between t=0. and t=10.
iloc (slice) – specify a location-based subsection of the curves to plot, ex:
```
>>> model.plot(iloc=slice(0,10))
```
will plot the first 10 time points.

Returns:

a pyplot axis object

Return type:

ax

plot_cumulative_hazard(**kwargs)¶

plot_hazard(**kwargs)¶

plot_loglogs(*args, **kwargs)¶: Plot \(\log(-\log(S(t)))\) against \(\log(t)\). Same arguments as .plot.

plot_survival_function(**kwargs)¶: Alias of plot

survival_function_at_times(times, label=None) → Series¶

Return a Pandas series of the predicted survival value at specific times

Parameters:: times (iterable or float)
Return type:: pd.Series