AalenJohansenFitter

class lifelines.fitters.aalen_johansen_fitter.AalenJohansenFitter(jitter_level=0.0001, seed=None, alpha=0.05, calculate_variance=True, **kwargs)

Bases: lifelines.fitters.NonParametricUnivariateFitter

Class for fitting the Aalen-Johansen estimate for the cumulative incidence function in a competing risks framework. Treating competing risks as censoring can result in over-estimated cumulative density functions. Using the Kaplan Meier estimator with competing risks as censored is akin to estimating the cumulative density if all competing risks had been prevented.

Aalen-Johansen cannot deal with tied times. We can get around this by randomly jittering the event times slightly. This will be done automatically and generates a warning.

Parameters:
  • alpha (float, option (default=0.05)) – The alpha value associated with the confidence intervals.
  • jitter_level (float, option (default=0.00001)) – If tied event times are detected, event times are randomly changed by this factor.
  • seed (int, option (default=None)) – To produce replicate results with tied event times, the numpy.random.seed can be specified in the function.
  • calculate_variance (bool, option (default=True)) – By default, AalenJohansenFitter calculates the variance and corresponding confidence intervals. Due to how the variance is calculated, the variance must be calculated for each event time individually. This is computationally intensive. For some procedures, like bootstrapping, the variance is not necessary. To reduce computation time during these procedures, calculate_variance can be set to False to skip the variance calculation.

Example

from lifelines import AalenJohansenFitter
from lifelines.datasets import load_waltons
T, E = load_waltons()['T'], load_waltons()['E']
ajf = AalenJohansenFitter(calculate_variance=True)
ajf.fit(T, E, event_of_interest=1)
ajf.cumulative_density_
ajf.plot()

References

If you are interested in learning more, we recommend the following open-access paper; Edwards JK, Hester LL, Gokhale M, Lesko CR. Methodologic Issues When Estimating Risks in Pharmacoepidemiology. Curr Epidemiol Rep. 2016;3(4):285-296.

conditional_time_to_event_

Return a DataFrame, with index equal to survival_function_, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

cumulative_density_at_times(times, label=None)
cumulative_hazard_at_times(times, label=None)
divide(other) → pandas.core.frame.DataFrame

Divide self’s survival function from another model’s survival function.

Parameters:other (same object as self)
fit(durations, event_observed, event_of_interest, timeline=None, entry=None, label=None, alpha=None, ci_labels=None, weights=None)
Parameters:
  • durations (an array or pd.Series of length n – duration of subject was observed for)
  • event_observed (an array, or pd.Series, of length n. Integer indicator of distinct events. Must be) – only positive integers, where 0 indicates censoring.
  • event_of_interest (integer – indicator for event of interest. All other integers are considered competing events) – Ex) event_observed contains 0, 1, 2 where 0:censored, 1:lung cancer, and 2:death. If event_of_interest=1, then death (2) is considered a competing event. The returned cumulative incidence function corresponds to risk of lung cancer
  • timeline (return the best estimate at the values in timelines (positively increasing))
  • entry (an array, or pd.Series, of length n – relative time when a subject entered the study. This is) – useful for left-truncated (not left-censored) observations. If None, all members of the population were born at time 0.
  • label (a string to name the column of the estimate.)
  • alpha (the alpha value in the confidence intervals. Overrides the initializing) – alpha for this call to fit only.
  • ci_labels (add custom column names to the generated confidence intervals) – as a length-2 list: [<lower-bound name>, <upper-bound name>]. Default: <label>_lower_<1-alpha/2>
  • weights (n array, or pd.Series, of length n, if providing a weighted dataset. For example, instead) – of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
Returns:

self – self, with new properties like cumulative_incidence_.

Return type:

AalenJohansenFitter

fit_right_censoring(*args, **kwargs)

Alias for fit

See also

fit()

hazard_at_times(times, label=None)
median_survival_time_

Return the unique time point, t, such that S(t) = 0.5. This is the “half-life” of the population, and a robust summary statistic for the population, if it exists.

percentile(p: float) → float

Return the unique time point, t, such that S(t) = p.

Parameters:p (float)
plot(**kwargs)

Plots a pretty figure of the model

Matplotlib plot arguments can be passed in inside the kwargs, plus

Parameters:
  • show_censors (bool) – place markers at censorship events. Default: False

  • censor_styles (dict) – If show_censors, this dictionary will be passed into the plot call.

  • ci_alpha (float) – the transparency level of the confidence interval. Default: 0.3

  • ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False

  • ci_show (bool) – show confidence intervals. Default: True

  • ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False

  • at_risk_counts (bool) – show group sizes at time points. See function add_at_risk_counts for details. Default: False

  • loc (slice) – specify a time-based subsection of the curves to plot, ex:

    >>> model.plot(loc=slice(0.,10.))
    

    will plot the time values between t=0. and t=10.

  • iloc (slice) – specify a location-based subsection of the curves to plot, ex:

    >>> model.plot(iloc=slice(0,10))
    

    will plot the first 10 time points.

Returns:

a pyplot axis object

Return type:

ax

plot_cumulative_density(**kwargs)
plot_cumulative_hazard(**kwargs)
plot_density(**kwargs)
plot_hazard(**kwargs)
plot_survival_function(**kwargs)
predict(times: Union[Iterable[float], float], interpolate=False) → pandas.core.series.Series

Predict the fitter at certain point in time. Uses a linear interpolation if points in time are not in the index.

Parameters:
  • times (scalar, or array) – a scalar or an array of times to predict the value of {0} at.
  • interpolate (bool, optional (default=False)) – for methods that produce a stepwise solution (Kaplan-Meier, Nelson-Aalen, etc), turning this to True will use an linear interpolation method to provide a more “smooth” answer.
subtract(other) → pandas.core.frame.DataFrame

Subtract self’s survival function from another model’s survival function.

Parameters:other (same object as self)
survival_function_at_times(times, label=None)