KaplanMeierFitter¶
- class lifelines.fitters.kaplan_meier_fitter.KaplanMeierFitter(alpha: float = 0.05, label: str | None = None)¶
Class for fitting the Kaplan-Meier estimate for the survival function.
- Parameters:
alpha (float, optional (default=0.05)) – The alpha value associated with the confidence intervals.
label (string, optional) – Provide a new label for the estimate - useful if looking at many groups.
Examples
from lifelines import KaplanMeierFitter from lifelines.datasets import load_waltons waltons = load_waltons() kmf = KaplanMeierFitter(label="waltons_data") kmf.fit(waltons['T'], waltons['E']) kmf.plot()
- survival_function_¶
The estimated survival function (with custom timeline if provided)
- Type:
DataFrame
- median_survival_time_¶
The estimated median time to event. np.inf if doesn’t exist.
- Type:
float
- confidence_interval_¶
The lower and upper confidence intervals for the survival function. An alias of
confidence_interval_survival_function_
. Uses Greenwood’s Exponential formula (“log-log” in R).- Type:
DataFrame
- confidence_interval_survival_function_¶
The lower and upper confidence intervals for the survival function. An alias of
confidence_interval_
. Uses Greenwood’s Exponential formula (“log-log” in R).- Type:
DataFrame
- cumulative_density_¶
The estimated cumulative density function (with custom timeline if provided)
- Type:
DataFrame
- confidence_interval_cumulative_density_¶
The lower and upper confidence intervals for the cumulative density.
- Type:
DataFrame
- durations¶
The durations provided
- Type:
array
- event_observed¶
The event_observed variable provided
- Type:
array
- timeline¶
The time line to use for plotting and indexing
- Type:
array
- entry¶
The entry array provided, or None
- Type:
array or None
- event_table¶
A summary of the life table
- Type:
DataFrame
- cumulative_density_at_times(times, label=None) Series ¶
Return a Pandas series of the predicted cumulative density at specific times
- Parameters:
times (iterable or float)
- Return type:
pd.Series
- fit(durations, event_observed=None, timeline=None, entry=None, label=None, alpha=None, ci_labels=None, weights=None, fit_options=None)¶
Fit the model to a right-censored dataset
- Parameters:
durations (an array, list, pd.DataFrame or pd.Series) – length n – duration (relative to subject’s birth) the subject was alive for.
event_observed (an array, list, pd.DataFrame, or pd.Series, optional) – True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None
timeline (an array, list, pd.DataFrame, or pd.Series, optional) – return the best estimate at the values in timelines (positively increasing)
entry (an array, list, pd.DataFrame, or pd.Series, optional) – relative time when a subject entered the study. This is useful for left-truncated (not left-censored) observations. If None, all members of the population entered study when they were “born”.
label (string, optional) – a string to name the column of the estimate.
alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
ci_labels (tuple, optional) – add custom column names to the generated confidence intervals as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<1-alpha/2>
weights (an array, list, pd.DataFrame, or pd.Series, optional) – if providing a weighted dataset. For example, instead of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
fit_options – Not used in KaplanMeierFitter
- Returns:
self – self with new properties like
survival_function_
,plot()
,median_survival_time_
- Return type:
- fit_interval_censoring(lower_bound, upper_bound, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, entry=None, weights=None, tol: float = 1e-05, show_progress: bool = False, fit_options=None, **kwargs) KaplanMeierFitter ¶
Fit the model to a interval-censored dataset using non-parametric MLE. This estimator is also called the Turnbull Estimator.
Currently, only closed interval are supported. However, it’s easy to create open intervals by adding (or subtracting) a very small value from the lower-bound (or upper bound). For example, the following turns closed intervals into open intervals.
>>> left, right = df['left'], df['right'] >>> KaplanMeierFitter().fit_interval_censoring(left + 0.00001, right - 0.00001)
Note
This is new and experimental, and many features are missing.
- Parameters:
lower_bound (an array, list, pd.DataFrame or pd.Series) – length n – lower bound of observations
upper_bound (an array, list, pd.DataFrame or pd.Series) – length n – upper bound of observations
event_observed (an array, list, pd.DataFrame, or pd.Series, optional) – True if the the death was observed, False if the event was lost (right-censored). This can be computed from the lower_bound and upper_bound, and can be left blank.
timeline (an array, list, pd.DataFrame, or pd.Series, optional) – return the best estimate at the values in timelines (positively increasing)
entry (an array, list, pd.DataFrame, or pd.Series, optional) – relative time when a subject entered the study. This is useful for left-truncated (not left-censored) observations. If None, all members of the population entered study when they were “born”.
label (string, optional) – a string to name the column of the estimate.
alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
ci_labels (tuple, optional) – add custom column names to the generated confidence intervals as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<1-alpha/2>
weights (an array, list, pd.DataFrame, or pd.Series, optional) – if providing a weighted dataset. For example, instead of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
tol (float, optional) – minimum difference in log likelihood changes for iterative algorithm.
show_progress (bool, optional) – display information during fitting.
- Returns:
self – self with new properties like
survival_function_
,plot()
,median_survival_time_
- Return type:
- fit_left_censoring(durations, event_observed=None, timeline=None, entry=None, label=None, alpha=None, ci_labels=None, weights=None, fit_options=None)¶
Fit the model to a left-censored dataset
- Parameters:
durations (an array, list, pd.DataFrame or pd.Series) – length n – duration subject was observed for
event_observed (an array, list, pd.DataFrame, or pd.Series, optional) – True if the the death was observed, False if the event was lost (right-censored). Defaults all True if event_observed==None
timeline (an array, list, pd.DataFrame, or pd.Series, optional) – return the best estimate at the values in timelines (positively increasing)
entry (an array, list, pd.DataFrame, or pd.Series, optional) – relative time when a subject entered the study. This is useful for left-truncated (not left-censored) observations. If None, all members of the population entered study when they were “born”.
label (string, optional) – a string to name the column of the estimate.
alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
ci_labels (tuple, optional) – add custom column names to the generated confidence intervals as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<1-alpha/2>
weights (an array, list, pd.DataFrame, or pd.Series, optional) – if providing a weighted dataset. For example, instead of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
- Returns:
self – self with new properties like
survival_function_
,plot()
,median_survival_time_
- Return type:
- property median_survival_time_: float¶
Return the unique time point, t, such that S(t) = 0.5. This is the “half-life” of the population, and a robust summary statistic for the population, if it exists.
- plot(**kwargs)¶
Plots a pretty figure of the model
Matplotlib plot arguments can be passed in inside the kwargs, plus
- Parameters:
show_censors (bool) – place markers at censorship events. Default: False
censor_styles (dict) – If show_censors, this dictionary will be passed into the plot call.
ci_alpha (float) – the transparency level of the confidence interval. Default: 0.3
ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False
ci_show (bool) – show confidence intervals. Default: True
ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False
at_risk_counts (bool) – show group sizes at time points. See function
add_at_risk_counts
for details. Default: Falseloc (slice) – specify a time-based subsection of the curves to plot, ex:
>>> model.plot(loc=slice(0.,10.))
will plot the time values between t=0. and t=10.
iloc (slice) – specify a location-based subsection of the curves to plot, ex:
>>> model.plot(iloc=slice(0,10))
will plot the first 10 time points.
- Returns:
a pyplot axis object
- Return type:
ax
- plot_cumulative_density(**kwargs)¶
Plots a pretty figure of the cumulative density function.
Matplotlib plot arguments can be passed in inside the kwargs.
- Parameters:
show_censors (bool) – place markers at censorship events. Default: False
censor_styles (bool) – If show_censors, this dictionary will be passed into the plot call.
ci_alpha (bool) – the transparency level of the confidence interval. Default: 0.3
ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False
ci_show (bool) – show confidence intervals. Default: True
ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False
at_risk_counts (bool) – show group sizes at time points. See function
add_at_risk_counts
for details. Default: Falseloc (slice) – specify a time-based subsection of the curves to plot, ex:
>>> model.plot(loc=slice(0.,10.))
will plot the time values between t=0. and t=10.
iloc (slice) – specify a location-based subsection of the curves to plot, ex:
>>> model.plot(iloc=slice(0,10))
will plot the first 10 time points.
- Returns:
a pyplot axis object
- Return type:
ax
- plot_cumulative_hazard(**kwargs)¶
- plot_hazard(**kwargs)¶
- plot_loglogs(*args, **kwargs)¶
Plot \(\log(-\log(S(t)))\) against \(\log(t)\). Same arguments as
.plot
.
- plot_survival_function(**kwargs)¶
Alias of
plot
- survival_function_: DataFrame¶
- survival_function_at_times(times, label=None) Series ¶
Return a Pandas series of the predicted survival value at specific times
- Parameters:
times (iterable or float)
- Return type:
pd.Series