KaplanMeierFitter

class lifelines.fitters.kaplan_meier_fitter.KaplanMeierFitter(alpha: float = 0.05, label: Optional[str] = None)

Bases: lifelines.fitters.NonParametricUnivariateFitter

Class for fitting the Kaplan-Meier estimate for the survival function.

Parameters:
  • alpha (float, optional (default=0.05)) – The alpha value associated with the confidence intervals.
  • label (string, optional) – Provide a new label for the estimate - useful if looking at many groups.

Examples

from lifelines import KaplanMeierFitter
from lifelines.datasets import load_waltons
waltons = load_waltons()

kmf = KaplanMeierFitter(label="waltons_data")
kmf.fit(waltons['T'], waltons['E'])
kmf.plot()
survival_function_

The estimated survival function (with custom timeline if provided)

Type:DataFrame
median_survival_time_

The estimated median time to event. np.inf if doesn’t exist.

Type:float
confidence_interval_

The lower and upper confidence intervals for the survival function. An alias of confidence_interval_survival_function_. Uses Greenwood’s Exponential formula (“log-log” in R).

Type:DataFrame
confidence_interval_survival_function_

The lower and upper confidence intervals for the survival function. An alias of confidence_interval_. Uses Greenwood’s Exponential formula (“log-log” in R).

Type:DataFrame
cumulative_density_

The estimated cumulative density function (with custom timeline if provided)

Type:DataFrame
confidence_interval_cumulative_density_

The lower and upper confidence intervals for the cumulative density.

Type:DataFrame
durations

The durations provided

Type:array
event_observed

The event_observed variable provided

Type:array
timeline

The time line to use for plotting and indexing

Type:array
entry

The entry array provided, or None

Type:array or None
event_table

A summary of the life table

Type:DataFrame
conditional_time_to_event_

Return a DataFrame, with index equal to survival_function_, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

cumulative_density_at_times(times, label=None) → pandas.core.series.Series

Return a Pandas series of the predicted cumulative density at specific times

Parameters:times (iterable or float)
Returns:
Return type:pd.Series
cumulative_hazard_at_times(times, label=None)
divide(other) → pandas.core.frame.DataFrame

Divide self’s survival function from another model’s survival function.

Parameters:other (same object as self)
fit(durations, event_observed=None, timeline=None, entry=None, label=None, alpha=None, ci_labels=None, weights=None, fit_options=None)

Fit the model to a right-censored dataset

Parameters:
  • durations (an array, list, pd.DataFrame or pd.Series) – length n – duration (relative to subject’s birth) the subject was alive for.
  • event_observed (an array, list, pd.DataFrame, or pd.Series, optional) – True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None
  • timeline (an array, list, pd.DataFrame, or pd.Series, optional) – return the best estimate at the values in timelines (positively increasing)
  • entry (an array, list, pd.DataFrame, or pd.Series, optional) – relative time when a subject entered the study. This is useful for left-truncated (not left-censored) observations. If None, all members of the population entered study when they were “born”.
  • label (string, optional) – a string to name the column of the estimate.
  • alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
  • ci_labels (tuple, optional) – add custom column names to the generated confidence intervals as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<1-alpha/2>
  • weights (an array, list, pd.DataFrame, or pd.Series, optional) – if providing a weighted dataset. For example, instead of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
  • fit_options – Not used in KaplanMeierFitter
Returns:

self – self with new properties like survival_function_, plot(), median_survival_time_

Return type:

KaplanMeierFitter

fit_interval_censoring(lower_bound, upper_bound, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, entry=None, weights=None, tol: float = 1e-05, show_progress: bool = False, fit_options=None, **kwargs) → lifelines.fitters.kaplan_meier_fitter.KaplanMeierFitter

Fit the model to a interval-censored dataset using non-parametric MLE. This estimator is also called the Turnbull Estimator.

Currently, only closed interval are supported. However, it’s easy to create open intervals by adding (or subtracting) a very small value from the lower-bound (or upper bound). For example, the following turns closed intervals into open intervals.

>>> left, right = df['left'], df['right']
>>> KaplanMeierFitter().fit_interval_censoring(left + 0.00001, right - 0.00001)

Note

This is new and experimental, and many features are missing.

Parameters:
  • lower_bound (an array, list, pd.DataFrame or pd.Series) – length n – lower bound of observations
  • upper_bound (an array, list, pd.DataFrame or pd.Series) – length n – upper bound of observations
  • event_observed (an array, list, pd.DataFrame, or pd.Series, optional) – True if the the death was observed, False if the event was lost (right-censored). This can be computed from the lower_bound and upper_bound, and can be left blank.
  • timeline (an array, list, pd.DataFrame, or pd.Series, optional) – return the best estimate at the values in timelines (positively increasing)
  • entry (an array, list, pd.DataFrame, or pd.Series, optional) – relative time when a subject entered the study. This is useful for left-truncated (not left-censored) observations. If None, all members of the population entered study when they were “born”.
  • label (string, optional) – a string to name the column of the estimate.
  • alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
  • ci_labels (tuple, optional) – add custom column names to the generated confidence intervals as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<1-alpha/2>
  • weights (an array, list, pd.DataFrame, or pd.Series, optional) – if providing a weighted dataset. For example, instead of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
  • tol (float, optional) – minimum difference in log likelihood changes for iterative algorithm.
  • show_progress (bool, optional) – display information during fitting.
Returns:

self – self with new properties like survival_function_, plot(), median_survival_time_

Return type:

KaplanMeierFitter

fit_left_censoring(durations, event_observed=None, timeline=None, entry=None, label=None, alpha=None, ci_labels=None, weights=None, fit_options=None)

Fit the model to a left-censored dataset

Parameters:
  • durations (an array, list, pd.DataFrame or pd.Series) – length n – duration subject was observed for
  • event_observed (an array, list, pd.DataFrame, or pd.Series, optional) – True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None
  • timeline (an array, list, pd.DataFrame, or pd.Series, optional) – return the best estimate at the values in timelines (positively increasing)
  • entry (an array, list, pd.DataFrame, or pd.Series, optional) – relative time when a subject entered the study. This is useful for left-truncated (not left-censored) observations. If None, all members of the population entered study when they were “born”.
  • label (string, optional) – a string to name the column of the estimate.
  • alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
  • ci_labels (tuple, optional) – add custom column names to the generated confidence intervals as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<1-alpha/2>
  • weights (an array, list, pd.DataFrame, or pd.Series, optional) – if providing a weighted dataset. For example, instead of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
Returns:

self – self with new properties like survival_function_, plot(), median_survival_time_

Return type:

KaplanMeierFitter

fit_right_censoring(*args, **kwargs)

Alias for fit

See also

fit()

hazard_at_times(times, label=None)
median_survival_time_

Return the unique time point, t, such that S(t) = 0.5. This is the “half-life” of the population, and a robust summary statistic for the population, if it exists.

percentile(p: float) → float

Return the unique time point, t, such that S(t) = p.

Parameters:p (float)
plot(**kwargs)

Plots a pretty figure of the model

Matplotlib plot arguments can be passed in inside the kwargs, plus

Parameters:
  • show_censors (bool) – place markers at censorship events. Default: False

  • censor_styles (dict) – If show_censors, this dictionary will be passed into the plot call.

  • ci_alpha (float) – the transparency level of the confidence interval. Default: 0.3

  • ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False

  • ci_show (bool) – show confidence intervals. Default: True

  • ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False

  • at_risk_counts (bool) – show group sizes at time points. See function add_at_risk_counts for details. Default: False

  • loc (slice) – specify a time-based subsection of the curves to plot, ex:

    >>> model.plot(loc=slice(0.,10.))
    

    will plot the time values between t=0. and t=10.

  • iloc (slice) – specify a location-based subsection of the curves to plot, ex:

    >>> model.plot(iloc=slice(0,10))
    

    will plot the first 10 time points.

Returns:

a pyplot axis object

Return type:

ax

plot_cumulative_density(**kwargs)

Plots a pretty figure of the cumulative density function.

Matplotlib plot arguments can be passed in inside the kwargs.

Parameters:
  • show_censors (bool) – place markers at censorship events. Default: False

  • censor_styles (bool) – If show_censors, this dictionary will be passed into the plot call.

  • ci_alpha (bool) – the transparency level of the confidence interval. Default: 0.3

  • ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False

  • ci_show (bool) – show confidence intervals. Default: True

  • ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False

  • at_risk_counts (bool) – show group sizes at time points. See function add_at_risk_counts for details. Default: False

  • loc (slice) – specify a time-based subsection of the curves to plot, ex:

    >>> model.plot(loc=slice(0.,10.))
    

    will plot the time values between t=0. and t=10.

  • iloc (slice) – specify a location-based subsection of the curves to plot, ex:

    >>> model.plot(iloc=slice(0,10))
    

    will plot the first 10 time points.

Returns:

a pyplot axis object

Return type:

ax

plot_cumulative_hazard(**kwargs)
plot_density(**kwargs)
plot_hazard(**kwargs)
plot_loglogs(*args, **kwargs)

Plot \(\log(-\log(S(t)))\) against \(\log(t)\). Same arguments as .plot.

plot_survival_function(**kwargs)

Alias of plot

predict(times: Union[Iterable[float], float], interpolate=False) → pandas.core.series.Series

Predict the fitter at certain point in time. Uses a linear interpolation if points in time are not in the index.

Parameters:
  • times (scalar, or array) – a scalar or an array of times to predict the value of {0} at.
  • interpolate (bool, optional (default=False)) – for methods that produce a stepwise solution (Kaplan-Meier, Nelson-Aalen, etc), turning this to True will use an linear interpolation method to provide a more “smooth” answer.
subtract(other) → pandas.core.frame.DataFrame

Subtract self’s survival function from another model’s survival function.

Parameters:other (same object as self)
survival_function_at_times(times, label=None) → pandas.core.series.Series

Return a Pandas series of the predicted survival value at specific times

Parameters:times (iterable or float)
Returns:
Return type:pd.Series