KaplanMeierFitter

class lifelines.fitters.kaplan_meier_fitter.KaplanMeierFitter(alpha: float = 0.05, label: str = None)

Class for fitting the Kaplan-Meier estimate for the survival function.

Parameters:
  • alpha (float, optional (default=0.05)) – The alpha value associated with the confidence intervals.

  • label (string, optional) – Provide a new label for the estimate - useful if looking at many groups.

Examples

from lifelines import KaplanMeierFitter
from lifelines.datasets import load_waltons
waltons = load_waltons()

kmf = KaplanMeierFitter(label="waltons_data")
kmf.fit(waltons['T'], waltons['E'])
kmf.plot()
survival_function_

The estimated survival function (with custom timeline if provided)

Type:

DataFrame

median_survival_time_

The estimated median time to event. np.inf if doesn’t exist.

Type:

float

confidence_interval_

The lower and upper confidence intervals for the survival function. An alias of confidence_interval_survival_function_. Uses Greenwood’s Exponential formula (“log-log” in R).

Type:

DataFrame

confidence_interval_survival_function_

The lower and upper confidence intervals for the survival function. An alias of confidence_interval_. Uses Greenwood’s Exponential formula (“log-log” in R).

Type:

DataFrame

cumulative_density_

The estimated cumulative density function (with custom timeline if provided)

Type:

DataFrame

confidence_interval_cumulative_density_

The lower and upper confidence intervals for the cumulative density.

Type:

DataFrame

durations

The durations provided

Type:

array

event_observed

The event_observed variable provided

Type:

array

timeline

The time line to use for plotting and indexing

Type:

array

entry

The entry array provided, or None

Type:

array or None

event_table

A summary of the life table

Type:

DataFrame

cumulative_density_at_times(times, label=None) Series

Return a Pandas series of the predicted cumulative density at specific times

Parameters:

times (iterable or float)

Return type:

pd.Series

fit(durations, event_observed=None, timeline=None, entry=None, label=None, alpha=None, ci_labels=None, weights=None, fit_options=None)

Fit the model to a right-censored dataset

Parameters:
  • durations (an array, list, pd.DataFrame or pd.Series) – length n – duration (relative to subject’s birth) the subject was alive for.

  • event_observed (an array, list, pd.DataFrame, or pd.Series, optional) – True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None

  • timeline (an array, list, pd.DataFrame, or pd.Series, optional) – return the best estimate at the values in timelines (positively increasing)

  • entry (an array, list, pd.DataFrame, or pd.Series, optional) – relative time when a subject entered the study. This is useful for left-truncated (not left-censored) observations. If None, all members of the population entered study when they were “born”.

  • label (string, optional) – a string to name the column of the estimate.

  • alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.

  • ci_labels (tuple, optional) – add custom column names to the generated confidence intervals as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<1-alpha/2>

  • weights (an array, list, pd.DataFrame, or pd.Series, optional) – if providing a weighted dataset. For example, instead of providing every subject as a single element of durations and event_observed, one could weigh subject differently.

  • fit_options – Not used in KaplanMeierFitter

Returns:

self – self with new properties like survival_function_, plot(), median_survival_time_

Return type:

KaplanMeierFitter

fit_interval_censoring(lower_bound, upper_bound, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, entry=None, weights=None, tol: float = 1e-05, show_progress: bool = False, fit_options=None, **kwargs) KaplanMeierFitter

Fit the model to a interval-censored dataset using non-parametric MLE. This estimator is also called the Turnbull Estimator.

Currently, only closed interval are supported. However, it’s easy to create open intervals by adding (or subtracting) a very small value from the lower-bound (or upper bound). For example, the following turns closed intervals into open intervals.

>>> left, right = df['left'], df['right']
>>> KaplanMeierFitter().fit_interval_censoring(left + 0.00001, right - 0.00001)

Note

This is new and experimental, and many features are missing.

Parameters:
  • lower_bound (an array, list, pd.DataFrame or pd.Series) – length n – lower bound of observations

  • upper_bound (an array, list, pd.DataFrame or pd.Series) – length n – upper bound of observations

  • event_observed (an array, list, pd.DataFrame, or pd.Series, optional) – True if the the death was observed, False if the event was lost (right-censored). This can be computed from the lower_bound and upper_bound, and can be left blank.

  • timeline (an array, list, pd.DataFrame, or pd.Series, optional) – return the best estimate at the values in timelines (positively increasing)

  • entry (an array, list, pd.DataFrame, or pd.Series, optional) – relative time when a subject entered the study. This is useful for left-truncated (not left-censored) observations. If None, all members of the population entered study when they were “born”.

  • label (string, optional) – a string to name the column of the estimate.

  • alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.

  • ci_labels (tuple, optional) – add custom column names to the generated confidence intervals as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<1-alpha/2>

  • weights (an array, list, pd.DataFrame, or pd.Series, optional) – if providing a weighted dataset. For example, instead of providing every subject as a single element of durations and event_observed, one could weigh subject differently.

  • tol (float, optional) – minimum difference in log likelihood changes for iterative algorithm.

  • show_progress (bool, optional) – display information during fitting.

Returns:

self – self with new properties like survival_function_, plot(), median_survival_time_

Return type:

KaplanMeierFitter

fit_left_censoring(durations, event_observed=None, timeline=None, entry=None, label=None, alpha=None, ci_labels=None, weights=None, fit_options=None)

Fit the model to a left-censored dataset

Parameters:
  • durations (an array, list, pd.DataFrame or pd.Series) – length n – duration subject was observed for

  • event_observed (an array, list, pd.DataFrame, or pd.Series, optional) – True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None

  • timeline (an array, list, pd.DataFrame, or pd.Series, optional) – return the best estimate at the values in timelines (positively increasing)

  • entry (an array, list, pd.DataFrame, or pd.Series, optional) – relative time when a subject entered the study. This is useful for left-truncated (not left-censored) observations. If None, all members of the population entered study when they were “born”.

  • label (string, optional) – a string to name the column of the estimate.

  • alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.

  • ci_labels (tuple, optional) – add custom column names to the generated confidence intervals as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<1-alpha/2>

  • weights (an array, list, pd.DataFrame, or pd.Series, optional) – if providing a weighted dataset. For example, instead of providing every subject as a single element of durations and event_observed, one could weigh subject differently.

Returns:

self – self with new properties like survival_function_, plot(), median_survival_time_

Return type:

KaplanMeierFitter

property median_survival_time_: float

Return the unique time point, t, such that S(t) = 0.5. This is the “half-life” of the population, and a robust summary statistic for the population, if it exists.

plot(**kwargs)

Plots a pretty figure of the model

Matplotlib plot arguments can be passed in inside the kwargs, plus

Parameters:
  • show_censors (bool) – place markers at censorship events. Default: False

  • censor_styles (dict) – If show_censors, this dictionary will be passed into the plot call.

  • ci_alpha (float) – the transparency level of the confidence interval. Default: 0.3

  • ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False

  • ci_show (bool) – show confidence intervals. Default: True

  • ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False

  • at_risk_counts (bool) – show group sizes at time points. See function add_at_risk_counts for details. Default: False

  • loc (slice) – specify a time-based subsection of the curves to plot, ex:

    >>> model.plot(loc=slice(0.,10.))
    

    will plot the time values between t=0. and t=10.

  • iloc (slice) – specify a location-based subsection of the curves to plot, ex:

    >>> model.plot(iloc=slice(0,10))
    

    will plot the first 10 time points.

Returns:

a pyplot axis object

Return type:

ax

plot_cumulative_density(**kwargs)

Plots a pretty figure of the cumulative density function.

Matplotlib plot arguments can be passed in inside the kwargs.

Parameters:
  • show_censors (bool) – place markers at censorship events. Default: False

  • censor_styles (bool) – If show_censors, this dictionary will be passed into the plot call.

  • ci_alpha (bool) – the transparency level of the confidence interval. Default: 0.3

  • ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False

  • ci_show (bool) – show confidence intervals. Default: True

  • ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False

  • at_risk_counts (bool) – show group sizes at time points. See function add_at_risk_counts for details. Default: False

  • loc (slice) – specify a time-based subsection of the curves to plot, ex:

    >>> model.plot(loc=slice(0.,10.))
    

    will plot the time values between t=0. and t=10.

  • iloc (slice) – specify a location-based subsection of the curves to plot, ex:

    >>> model.plot(iloc=slice(0,10))
    

    will plot the first 10 time points.

Returns:

a pyplot axis object

Return type:

ax

plot_cumulative_hazard(**kwargs)
plot_hazard(**kwargs)
plot_loglogs(*args, **kwargs)

Plot \(\log(-\log(S(t)))\) against \(\log(t)\). Same arguments as .plot.

plot_survival_function(**kwargs)

Alias of plot

survival_function_at_times(times, label=None) Series

Return a Pandas series of the predicted survival value at specific times

Parameters:

times (iterable or float)

Return type:

pd.Series