lifelines package¶
Subpackages¶
 lifelines.datasets package
 lifelines.fitters package
 Submodules
 lifelines.fitters.aalen_additive_fitter module
 lifelines.fitters.aalen_johansen_fitter module
 lifelines.fitters.breslow_fleming_harrington_fitter module
 lifelines.fitters.cox_time_varying_fitter module
 lifelines.fitters.coxph_fitter module
 lifelines.fitters.exponential_fitter module
 lifelines.fitters.kaplan_meier_fitter module
 lifelines.fitters.nelson_aalen_fitter module
 lifelines.fitters.weibull_fitter module
 Module contents
 lifelines.utils package
Submodules¶
lifelines.compat module¶
lifelines.generate_datasets module¶

class
lifelines.generate_datasets.
coeff_func
(f)¶ Bases:
object
This is a decorator class used later to construct nice names

lifelines.generate_datasets.
constant_coefficients
(d, timelines, constant=True, independent=0)¶ Proportional hazards model.
d: the dimension of the dataset timelines: the observational times constant: True for constant coefficients independent: the number of coffients to set to 0 (covariate is ind of survival), or
a list of covariates to make indepent.returns a matrix (t,d+1) of coefficients

lifelines.generate_datasets.
construct_survival_curves
(hazard_rates, timelines)¶ Given hazard rates, reconstruct the survival curves
Parameters:  hazard_rates ((n,t) array)
 timelines ((t,) the observational times)
Returns: t
Return type: survial curves, (n,t) array

lifelines.generate_datasets.
cumulative_integral
(fx, x)¶ Return the cumulative integral of arrays, initial value is 0.
Parameters:  fx ((n,d) numpy array, what you want to integral of)
 x ((n,) numpy array, location to integrate over.)

lifelines.generate_datasets.
exponential_survival_data
(n, cr=0.05, scale=1.0)¶

lifelines.generate_datasets.
generate_covariates
(n, d, n_binary=0, p=0.5)¶ n: the number of instances, integer d: the dimension of the covarites, integer binary: a float between 0 and d the represents the binary covariates p: in binary, the probability of 1
returns (n, d+1)

lifelines.generate_datasets.
generate_hazard_rates
(n, d, timelines, constant=False, independent=0, n_binary=0, model='aalen')¶  n: the number of instances d: the number of covariates lifelines: the observational times constant: make the coeffients constant (not time dependent) n_binary: the number of binary covariates model: from [“aalen”, “cox”]
 Returns:s
 hazard rates: (t,n) dataframe, coefficients: (t,d+1) dataframe of coefficients, covarites: (n,d) dataframe

lifelines.generate_datasets.
generate_observational_matrix
(n, d, timelines, constant=False, independent=0, n_binary=0, model='aalen')¶

lifelines.generate_datasets.
generate_random_lifetimes
(hazard_rates, timelines, size=1, censor=None)¶  Based on the hazard rates, compute random variables from the survival function
hazard_rates: (n,t) array of hazard rates timelines: (t,) the observation times size: the number to return, per hardard rate censor: If True, adds uniform censoring between timelines.max() and 0
If a postive number, censors all events above that value. If (n,) np.array >=0 , censor elementwise.
Returns:  survival_times ((size,n) array of random variables.)
 (optional) censorship (if censor is true, returns (size,n) array with bool True) – if the death was observed (not rightcensored)

lifelines.generate_datasets.
right_censor_lifetimes
(lifetimes, max_, min_=0)¶  Right censor the deaths, uniformly
 lifetimes: (n,) array of positive random variables max_: the max time a censorship can occur min_: the min time a censorship can occur
 Returns
 The actual observations including uniform right censoring, and D_i (observed death or did not)
I think this is deprecated

lifelines.generate_datasets.
time_varying_coefficients
(d, timelines, constant=False, independent=0, randgen=<builtin method exponential of mtrand.RandomState object>)¶ Time vary coefficients
d: the dimension of the dataset timelines: the observational times constant: True for constant coefficients independent: the number of coffients to set to 0 (covariate is ind of survival), or
a list of covariates to make indepent.randgen: how scalar coefficients (betas) are sampled.
returns a matrix (t,d+1) of coefficients
lifelines.plotting module¶

class
lifelines.plotting.
PlotEstimateConfig
(cls, estimate, loc, iloc, show_censors, censor_styles, bandwidth, **kwargs)¶ Bases:
object

lifelines.plotting.
add_at_risk_counts
(*fitters, **kwargs)¶ Add counts showing how many individuals were at risk at each time point in survival/hazard plots.
Parameters:  One or several fitters, for example KaplanMeierFitter,
 NelsonAalenFitter, etc…
 Keyword arguments (all optional):
ax: The axes to add the labels to. Default is the current axes. fig: The figure of the axes. Default is the current figure. labels: The labels to use for the fitters. Default is whatever was
specified in the fitters’ fitfunction. Giving ‘None’ will hide fitter labels.
Returns: The axes which was used. Return type: ax Examples
# First train some fitters and plot them fig = plt.figure() ax = plt.subplot(111)
f1 = KaplanMeierFitter() f1.fit(data) f1.plot(ax=ax)
f2 = KaplanMeierFitter() f2.fit(data) f2.plot(ax=ax)
# There are equivalent add_at_risk_counts(f1, f2) add_at_risk_counts(f1, f2, ax=ax, fig=fig)
# This overrides the labels add_at_risk_counts(f1, f2, labels=[‘fitter one’, ‘fitter two’])
# This hides the labels add_at_risk_counts(f1, f2, labels=None)

lifelines.plotting.
create_dataframe_slicer
(iloc, loc)¶

lifelines.plotting.
fill_between_steps
(x, y1, y2=0, h_align='left', ax=None, **kwargs)¶ Fills a hole in matplotlib: Fill_between for step plots. https://gist.github.com/thriveth/8352565
 x : arraylike
 Array/vector of index values. These are assumed to be equallyspaced. If not, the result will probably look weird…
 y1 : arraylike
 Array/vector of values to be filled under.
 y2 : arrayLike
 Array/vector or bottom values for filled area. Default is 0.
**kwargs will be passed to the matplotlib fill_between() function.

lifelines.plotting.
is_latex_enabled
()¶ Returns True if LaTeX is enabled in matplotlib’s rcParams, False otherwise

lifelines.plotting.
move_spines
(ax, sides, dists)¶ Move the entire spine relative to the figure.
Parameters:  ax – axes to operate on
 sides – list of sides to move. Sides: top, left, bottom, right
 dists – list of float distances to move. Should match sides in length.
Example: move_spines(ax, sides=[‘left’, ‘bottom’], dists=[0.02, 0.1])

lifelines.plotting.
plot_estimate
(cls, estimate=None, loc=None, iloc=None, show_censors=False, censor_styles=None, ci_legend=False, ci_force_lines=False, ci_alpha=0.25, ci_show=True, at_risk_counts=False, invert_y_axis=False, bandwidth=None, **kwargs)¶ ” Plots a pretty figure of {0}.{1}
Matplotlib plot arguments can be passed in inside the kwargs, plus
Parameters: show_censors (bool) – place markers at censorship events. Default: False
censor_styles (bool) – If show_censors, this dictionary will be passed into the plot call.
ci_alpha (bool) – the transparency level of the confidence interval. Default: 0.3
ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False
ci_show (bool) – show confidence intervals. Default: True
ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False
at_risk_counts (bool) – show group sizes at time points. See function
add_at_risk_counts
for details. Default: Falseloc (slice) –
 specify a timebased subsection of the curves to plot, ex:
.plot(loc=slice(0.,10.))
will plot the time values between t=0. and t=10.
iloc (slice) –
 specify a locationbased subsection of the curves to plot, ex:
.plot(iloc=slice(0,10))
will plot the first 10 time points.
invert_y_axis (bool) – boolean to invert the yaxis, useful to show cumulative graphs instead of survival graphs.
bandwidth (float) –
 specify the bandwidth of the kernel smoother for the
smoothedhazard rate. Only used when called ‘plot_hazard’.
Returns: a pyplot axis object
Return type: ax

lifelines.plotting.
plot_lifetimes
(duration, event_observed=None, entry=None, left_truncated=False, sort_by_duration=False, event_observed_color='#A60628', event_censored_color='#348ABD', **kwargs)¶ Retuns a lifetime plot, see examples: https://lifelines.readthedocs.io/en/latest/Survival%20Analysis%20intro.html#censorship
Parameters:  duration ((n,) numpy array or pd.Series) – duration subject was observed for.
 event_observed ((n,) numpy array or pd.Series) – array of booleans: True if event observed, else False.
 entry ((n,) numpy array or pd.Series) – offsetting the births away from t=0. This could be from lefttruncation, or delayed entry into study.
 left_truncated (boolean) – if entry is provided, and the data is lefttruncated, this will display additional information in the plot to reflect this.
 sort_by_duration (boolean) – sort by the duration vector
Returns: Return type: ax

lifelines.plotting.
plot_loglogs
(cls, loc=None, iloc=None, show_censors=False, censor_styles=None, **kwargs)¶ Specifies a plot of the log(log(SV)) versus log(time) where SV is the estimated survival function.

lifelines.plotting.
remove_spines
(ax, sides)¶ Remove spines of axis.
Parameters:  ax – axes to operate on
 sides – list of sides: top, left, bottom, right
Examples: removespines(ax, [‘top’]) removespines(ax, [‘top’, ‘bottom’, ‘right’, ‘left’])

lifelines.plotting.
remove_ticks
(ax, x=False, y=False)¶ Remove ticks from axis.
Parameters:  ax – axes to work on
 x – if True, remove xticks. Default False.
 y – if True, remove yticks. Default False.
Examples: removeticks(ax, x=True) removeticks(ax, x=True, y=True)

lifelines.plotting.
set_kwargs_ax
(kwargs)¶

lifelines.plotting.
set_kwargs_color
(kwargs)¶

lifelines.plotting.
set_kwargs_drawstyle
(kwargs)¶
lifelines.statistics module¶

class
lifelines.statistics.
StatisticalResult
(p_value, test_statistic, name=None, **kwargs)¶ Bases:
object
This class holds the result of statistical tests, like logrank and proportional hazard tests, with a nice printer wrapper to display the results.
Note
This class’ API changed in version 0.16.0.
Parameters:  p_value (iterable or float) – the pvalues of a statistical test(s)
 test_statistic (iterable or float) – the test statistics of a statistical test(s). Must be the same size as pvalues if iterable.
 name (iterable or string) – if this class holds multiple results (ex: from a pairwise comparison), this can hold the names. Must be the same size as pvalues if iterable.
 kwargs – additional information to display in
print_summary()
.

print_summary
(decimals=2, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

summary
¶ returns: summary – a DataFrame containing the test statistics and the pvalue :rtype: DataFrame

class
lifelines.statistics.
TimeTransformers
¶ Bases:
object

TIME_TRANSFOMERS
= {'identity': <function TimeTransformers.<lambda> at 0x7f6a3a9e2950>, 'km': <function TimeTransformers.<lambda> at 0x7f6a3a9e2a60>, 'log': <function TimeTransformers.<lambda> at 0x7f6a3a9e29d8>, 'rank': <function TimeTransformers.<lambda> at 0x7f6a3a9e28c8>}¶

get
(key_or_callable)¶


lifelines.statistics.
chisq_test
(U, degrees_freedom, alpha)¶

lifelines.statistics.
logrank_test
(durations_A, durations_B, event_observed_A=None, event_observed_B=None, alpha=0.95, t_0=1, **kwargs)¶ Measures and reports on whether two intensity processes are different. That is, given two event series, determines whether the data generating processes are statistically different. The teststatistic is chisquared under the null hypothesis.
 H_0: both event series are from the same generating processes
 H_A: the event series are from different generating processes.
This implicitly uses the logrank weights.
Parameters:  durations_A (iterable) – a (n,) listlike of event durations (birth to death,…) for the first population.
 durations_B (iterable) – a (n,) listlike of event durations (birth to death,…) for the second population.
 event_observed_A (iterable, optional) – a (n,) listlike of censorship flags, (1 if observed, 0 if not), for the first population. Default assumes all observed.
 event_observed_B (iterable, optional) – a (n,) listlike of censorship flags, (1 if observed, 0 if not), for the second population. Default assumes all observed.
 t_0 (float, optional (default=1)) – the final time period under observation, 1 for all time.
 alpha (float, optional (default=0.95)) – the confidence level
 kwargs – add keywords and metadata to the experiment summary
Returns: results – a StatisticalResult object with properties ‘p_value’, ‘summary’, ‘test_statistic’, ‘print_summary’
Return type: Examples
>>> T1 = [1, 4, 10, 12, 12, 3, 5.4] >>> E1 = [1, 0, 1, 0, 1, 1, 1] >>> >>> T2 = [4, 5, 7, 11, 14, 20, 8, 8] >>> E2 = [1, 1, 1, 1, 1, 1, 1, 1] >>> >>> from lifelines.statistics import logrank_test >>> results = logrank_test(T1, T2, event_observed_A=E1, event_observed_B=E2) >>> >>> results.print_summary() >>> print(results.p_value) # 0.7676 >>> print(results.test_statistic) # 0.0872
Notes
This is a special case of the function
multivariate_logrank_test
, which is used internally. See Survival and Event Analysis, page 108.

lifelines.statistics.
multivariate_logrank_test
(event_durations, groups, event_observed=None, alpha=0.95, t_0=1, **kwargs)¶ This test is a generalization of the logrank_test: it can deal with n>2 populations (and should be equal when n=2):
 H_0: all event series are from the same generating processes
 H_A: there exist atleast one group that differs from the other.
Parameters:  event_durations (iterable) – a (n,) listlike representing the (possibly partial) durations of all individuals
 groups (iterable) – a (n,) listlike of unique group labels for each individual.
 event_observed (iterable, optional) – a (n,) listlike of event_observed events: 1 if observed death, 0 if censored. Defaults to all observed.
 t_0 (float, optional (default=1)) – the period under observation, 1 for all time.
 alpha (float, optional (default=0.95)) – the confidence level
 kwargs – add keywords and metadata to the experiment summary.
Returns: results – a StatisticalResult object with properties ‘p_value’, ‘summary’, ‘test_statistic’, ‘print_summary’
Return type: Examples
>>> df = pd.DataFrame({ >>> 'durations': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'events': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], >>> 'groups': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2] >>> }) >>> result = multivariate_logrank_test(df['durations'], df['groups'], df['events']) >>> result.test_statistic >>> result.p_value >>> result.print_summary()
>>> # numpy example >>> G = [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2] >>> T = [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7] >>> E = [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0] >>> result = multivariate_logrank_test(T, G, E) >>> result.test_statistic
See also

lifelines.statistics.
pairwise_logrank_test
(event_durations, groups, event_observed=None, alpha=0.95, t_0=1, bonferroni=True, **kwargs)¶ Perform the logrank test pairwise for all n>2 unique groups (use the more appropriate logrank_test for n=2). We have to be careful here: if there are n groups, then there are n*(n1)/2 pairs – so many pairs increase the chance that here will exist a significantly different pair purely by chance. For this reason, we use the Bonferroni correction (rewight the alpha value higher to accomidate the multiple tests).
Parameters:  event_durations (iterable) – a (n,) listlike representing the (possibly partial) durations of all individuals
 groups (iterable) – a (n,) listlike of unique group labels for each individual.
 event_observed (iterable, optional) – a (n,) listlike of event_observed events: 1 if observed death, 0 if censored. Defaults to all observed.
 t_0 (float, optional (default=1)) – the period under observation, 1 for all time.
 alpha (float, optional (default=0.95)) – the confidence level
 bonferroni (boolean, optional (default=True)) – If True, uses the Bonferroni correction to compare the M=n(n1)/2 pairs, i.e alpha = alpha/M.
 kwargs – add keywords and metadata to the experiment summary.
Returns: results – a StatisticalResult object that contains all the pairwise comparisons (try
StatisticalResult.summary
orStatisticalResult.print_summarty
)Return type: See also

lifelines.statistics.
power_under_cph
(n_exp, n_con, p_exp, p_con, postulated_hazard_ratio, alpha=0.05)¶ This computes the power of the hypothesis test that the two groups, experiment and control, have different hazards (that is, the relative hazard ratio is different from 1.)
Parameters:  n_exp (integer) – size of the experiment group.
 n_con (integer) – size of the control group.
 p_exp (float) – probability of failure in experimental group over period of study.
 p_con (float) – probability of failure in control group over period of study
 postulated_hazard_ratio (float)
 the postulated hazard ratio
 alpha (float, optional (default=0.05)) – type I error rate
Returns: power – power to detect the magnitude of the hazard ratio as small as that specified by postulated_hazard_ratio.
Return type: float
Notes
See also

lifelines.statistics.
proportional_hazard_test
(fitted_cox_model, training_df, time_transform='rank', precomputed_residuals=None, **kwargs)¶ Test whether any variable in a Cox model breaks the proportional hazard assumption.
Parameters:  fitted_cox_model (CoxPHFitter) – the fitted Cox model, fitted with training_df, you wish to test. Currently only the CoxPHFitter is supported, but later CoxTimeVaryingFitter, too.
 training_df (DataFrame) – the DataFrame used in the call to the Cox model’s
fit
.  time_transform (vectorized function or string, optional (default=’rank’)) – {‘all’, ‘km’, ‘rank’, ‘identity’, ‘log’} One of the strings above, or a function to transform the time (must accept (time, durations, weights) however). ‘all’ will present all the transforms.
 precomputed_residuals (DataFrame, optional) – specify the residuals, if already computed.
 kwargs – additional parameters to add to the StatisticalResult
Returns: Return type: Notes
R uses the defalt km, we use rank, as this performs well versus other transforms. See http://eprints.lse.ac.uk/84988/1/06_ParkHendry2015ReassessingSchoenfeldTests_Final.pdf

lifelines.statistics.
sample_size_necessary_under_cph
(power, ratio_of_participants, p_exp, p_con, postulated_hazard_ratio, alpha=0.05)¶ This computes the sample size for needed power to compare two groups under a Cox Proportional Hazard model.
Parameters:  power (float) – power to detect the magnitude of the hazard ratio as small as that specified by postulated_hazard_ratio.
 ratio_of_participants (ratio of participants in experimental group over control group.)
 p_exp (float) – probability of failure in experimental group over period of study.
 p_con (float) – probability of failure in control group over period of study
 postulated_hazard_ratio (float) – the postulated hazard ratio
 alpha (float, optional (default=0.05)) – type I error rate
Returns:  n_exp (integer) – the samples sizes need for the experiment to achieve desired power
 n_con (integer) – the samples sizes need for the control group to achieve desired power
Examples
>>> from lifelines.statistics import sample_size_necessary_under_cph >>> >>> desired_power = 0.8 >>> ratio_of_participants = 1. >>> p_exp = 0.25 >>> p_con = 0.35 >>> postulated_hazard_ratio = 0.7 >>> n_exp, n_con = sample_size_necessary_under_cph(desired_power, ratio_of_participants, p_exp, p_con, postulated_hazard_ratio) >>> # (421, 421)
Notes
See also

lifelines.statistics.
two_sided_z_test
(Z, alpha)¶
lifelines.version module¶
Module contents¶

class
lifelines.
KaplanMeierFitter
(alpha=0.95)¶ Bases:
lifelines.fitters.UnivariateFitter
Class for fitting the KaplanMeier estimate for the survival function.
Parameters: alpha (float, option (default=0.95)) – The alpha value associated with the confidence intervals. Examples
>>> from lifelines import KaplanMeierFitter >>> from lifelines.datasets import load_waltons >>> waltons = load_waltons() >>> kmf = KaplanMeierFitter() >>> kmf.fit(waltons['T'], waltons['E']) >>> kmf.plot()

fit
(durations, event_observed=None, timeline=None, entry=None, label='KM_estimate', alpha=None, left_censorship=False, ci_labels=None, weights=None)¶ Parameters:  duration (an array, or pd.Series, of length n – duration subject was observed for)
 timeline (return the best estimate at the values in timelines (postively increasing))
 event_observed (an array, or pd.Series, of length n – True if the the death was observed, False if the event) – was lost (rightcensored). Defaults all True if event_observed==None
 entry (an array, or pd.Series, of length n – relative time when a subject entered the study. This is) – useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”.
 label (a string to name the column of the estimate.)
 alpha (the alpha value in the confidence intervals. Overrides the initializing) – alpha for this call to fit only.
 left_censorship (True if durations and event_observed refer to left censorship events. Default False)
 ci_labels (add custom column names to the generated confidence intervals) – as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 weights (n array, or pd.Series, of length n, if providing a weighted dataset. For example, instead) – of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
Returns: self – self with new properties like ‘survival_function_’.
Return type:

plot_loglogs
(*args, **kwargs)¶


class
lifelines.
NelsonAalenFitter
(alpha=0.95, nelson_aalen_smoothing=True)¶ Bases:
lifelines.fitters.UnivariateFitter
Class for fitting the NelsonAalen estimate for the cumulative hazard.
NelsonAalenFitter(alpha=0.95, nelson_aalen_smoothing=True)
alpha: The alpha value associated with the confidence intervals. nelson_aalen_smoothing: If the event times are naturally discrete (like discrete years, minutes, etc.)
then it is advisable to turn this parameter to False. See [1], pg.84.Notes
[1] Aalen, O., Borgan, O., Gjessing, H., 2008. Survival and Event History Analysis

conditional_time_to_event_
¶

fit
(durations, event_observed=None, timeline=None, entry=None, label='NA_estimate', alpha=None, ci_labels=None, weights=None)¶ Parameters: duration (an array, or pd.Series, of length n) – duration subject was observed for
timeline (iterable) – return the best estimate at the values in timelines (postively increasing)
event_observed (an array, or pd.Series, of length n) – True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated observations, i.e the birth event was not observed. If None, defaults to all 0 (all birth events observed.)
label (string) – a string to name the column of the estimate.
alpha (float) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
ci_labels (iterable) –
 add custom column names to the generated confidence intervals
as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
weights (n array, or pd.Series, of length n) – if providing a weighted dataset. For example, instead of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
Returns: Return type: self, with new properties like ‘cumulative_hazard_’.

plot_hazard
(*args, **kwargs)¶

smoothed_hazard_
(bandwidth)¶ Parameters: bandwidth (float) – the bandwith used in the Epanechnikov kernel. Returns: a DataFrame of the smoothed hazard Return type: DataFrame

smoothed_hazard_confidence_intervals_
(bandwidth, hazard_=None)¶ Parameters:  bandwidth (float) – the bandwith to use in the Epanechnikov kernel. > 0
 hazard_ (numpy array) – a computed (n,) numpy array of estimated hazard rates. If none, uses naf.smoothed_hazard_


class
lifelines.
AalenAdditiveFitter
(fit_intercept=True, alpha=0.95, coef_penalizer=0.0, smoothing_penalizer=0.0)¶ Bases:
lifelines.fitters.BaseFitter
This class fits the regression model:
\[h(tx) = b_0(t) + b_1(t) x_1 + ... + b_N(t) x_N\]that is, the hazard rate is a linear function of the covariates with timevarying coefficients. This implementation assumes nontimevarying covariates, see
TODO: name
Note
This class was rewritten in lifelines 0.17.0 to focus solely on static datasets. There is no guarantee of backwards compatibility.
Parameters:  fit_intercept (bool, optional (default: True)) – If False, do not attach an intercept (column of ones) to the covariate matrix. The intercept, \(b_0(t)\) acts as a baseline hazard.
 alpha (float) – the level in the confidence intervals.
 coef_penalizer (float, optional (default: 0)) – Attach a L2 penalizer to the size of the coeffcients during regression. This improves stability of the estimates and controls for high correlation between covariates. For example, this shrinks the absolute value of \(c_{i,t}\).
 smoothing_penalizer (float, optional (default: 0)) – Attach a L2 penalizer to difference between adjacent (over time) coefficents. For example, this shrinks the absolute value of \(c_{i,t}  c_{i,t+1}\).

fit
(df, duration_col, event_col=None, weights_col=None, show_progress=False)¶ Parameters: Fit the Aalen Additive model to a dataset.
Parameters:  df (DataFrame) – a Pandas dataframe with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in dataframe that contains the subjects’ lifetimes.
 event_col (string, optional) – the name of thecolumn in dataframe that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 weights_col (string, optional) – an optional column in the dataframe, df, that denotes the weight per subject. This column is expelled and not used as a covariate, but as a weight in the final regression. Default weight is 1. This can be used for caseweights. For example, a weight of 2 means there were two subjects with identical observations. This can be used for sampling weights.
 show_progress (boolean, optional (default=False)) – Since the fitter is iterative, show iteration number.
Returns: self – self with additional new properties:
cumulative_hazards_
, etc.Return type: Examples
>>> from lifelines import AalenAdditiveFitter >>> >>> df = pd.DataFrame({ >>> 'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], >>> 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], >>> 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> }) >>> >>> aaf = AalenAdditiveFitter() >>> aaf.fit(df, 'T', 'E') >>> aaf.predict_median(df) >>> aaf.print_summary()

plot
(columns=None, loc=None, iloc=None, **kwargs)¶ ” A wrapper around plotting. Matplotlib plot arguments can be passed in, plus:
Parameters: columns (string or listlike, optional) – If not empty, plot a subset of columns from the
cumulative_hazards_
. Default all.ix (slice, optional) –
 specify a timebased subsection of the curves to plot, ex:
.plot(loc=slice(0.,10.))
will plot the time values between t=0. and t=10.
iloc (slice, optional) –
 specify a locationbased subsection of the curves to plot, ex:
.plot(iloc=slice(0,10))
will plot the first 10 time points.

predict_cumulative_hazard
(X)¶ Returns the hazard rates for the individuals
Parameters: X (a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns) – can be in any order. If a numpy array, columns must be in the same order as the training data.

predict_expectation
(X)¶ Compute the expected lifetime, E[T], using covariates X.
Parameters:  X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 Returns the expected lifetimes for the individuals

predict_median
(X)¶ Parameters:  X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 Returns the median lifetimes for the individuals

predict_percentile
(X, p=0.5)¶ Returns the median lifetimes for the individuals. http://stats.stackexchange.com/questions/102986/percentilelossfunctions
Parameters: X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.

predict_survival_function
(X)¶ Returns the survival functions for the individuals
Parameters: X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.

print_summary
(decimals=2, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

score_
¶ The concordance score (also known as the cindex) of the fit. The cindex is a generalization of the AUC to survival data, including censorships.
For this purpose, the
score_
is a measure of the predictive accuracy of the fitted model onto the training dataset. It’s analgous to the R^2 in linear models.

smoothed_hazards_
(bandwidth=1)¶ Using the epanechnikov kernel to smooth the hazard function, with sigma/bandwidth

summary
¶ Summary statistics describing the fit. Set alpha property in the object before calling.
Returns: df Return type: DataFrame

class
lifelines.
BreslowFlemingHarringtonFitter
(alpha=0.95)¶ Bases:
lifelines.fitters.UnivariateFitter
Class for fitting the BreslowFlemingHarrington estimate for the survival function. This estimator is a biased estimator of the survival function but is more stable when the popualtion is small and there are too few early truncation times, it may happen that is the number of patients at risk and the number of deaths is the same.
Mathematically, the NAF estimator is the negative logarithm of the BFH estimator.
BreslowFlemingHarringtonFitter(alpha=0.95)
Parameters: alpha (float) – The alpha value associated with the confidence intervals. 
fit
(durations, event_observed=None, timeline=None, entry=None, label='BFH_estimate', alpha=None, ci_labels=None)¶ Parameters: duration (an array, or pd.Series, of length n) – duration subject was observed for
timeline – return the best estimate at the values in timelines (postively increasing)
event_observed (an array, or pd.Series, of length n) – True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated observations, i.e the birth event was not observed. If None, defaults to all 0 (all birth events observed.)
label (string) – a string to name the column of the estimate.
alpha (float) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
ci_labels (iterable) –
 add custom column names to the generated confidence intervals
as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
Returns: Return type: self, with new properties like ‘survival_function_’.


class
lifelines.
CoxPHFitter
(alpha=0.95, tie_method='Efron', penalizer=0.0, strata=None)¶ Bases:
lifelines.fitters.BaseFitter
This class implements fitting Cox’s proportional hazard model:
\[h(tx) = h_0(t) \exp(x \beta)\]Parameters: alpha (float, optional (default=0.95)) – the level in the confidence intervals.
tie_method (string, optional) – specify how the fitter should deal with ties. Currently only ‘Efron’ is available.
penalizer (float, optional (default=0.0)) – Attach a L2 penalizer to the size of the coeffcients during regression. This improves stability of the estimates and controls for high correlation between covariates. For example, this shrinks the absolute value of \(\beta_i\). Recommended, even if a small value. The penalty is \(1/2 \text{penalizer} beta^2\).
strata (list, optional) –
 specify a list of columns to use in stratification. This is useful if a
catagorical covariate does not obey the proportional hazard assumption. This is used similar to the strata expression in R. See http://courses.washington.edu/b515/l17.pdf.
Examples
>>> from lifelines.datasets import load_rossi >>> from lifelines import CoxPHFitter >>> rossi = load_rossi() >>> cph = CoxPHFitter() >>> cph.fit(rossi, 'week', 'arrest') >>> cph.print_summary()

check_assumptions
(training_df, advice=True, show_plots=True, p_value_threshold=0.05, plot_n_bootstraps=10)¶ section 5 in https://socialsciences.mcmaster.ca/jfox/Books/Companion/appendices/AppendixCoxRegression.pdf http://www.mwsug.org/proceedings/2006/stats/MWSUG2006SD08.pdf http://eprints.lse.ac.uk/84988/1/06_ParkHendry2015ReassessingSchoenfeldTests_Final.pdf

compute_residuals
(training_dataframe, kind)¶ Parameters:  training_dataframe (pandas DataFrame) – the same training dataframe given in fit
 kind (string) – {‘schoenfeld’, ‘score’, ‘delta_beta’, ‘deviance’, ‘martingale’}

fit
(df, duration_col=None, event_col=None, show_progress=False, initial_beta=None, strata=None, step_size=None, weights_col=None, cluster_col=None, robust=False, batch_mode=None)¶ Fit the Cox Propertional Hazard model to a dataset.
Parameters:  df (DataFrame) – a Pandas dataframe with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights, strata). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in dataframe that contains the subjects’ lifetimes.
 event_col (string, optional) – the name of thecolumn in dataframe that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 weights_col (string, optional) – an optional column in the dataframe, df, that denotes the weight per subject. This column is expelled and not used as a covariate, but as a weight in the final regression. Default weight is 1. This can be used for caseweights. For example, a weight of 2 means there were two subjects with identical observations. This can be used for sampling weights. In that case, use robust=True to get more accurate standard errors.
 show_progress (boolean, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 initial_beta (numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 strata (list or string, optional) – specify a column or list of columns n to use in stratification. This is useful if a catagorical covariate does not obey the proportional hazard assumption. This is used similar to the strata expression in R. See http://courses.washington.edu/b515/l17.pdf.
 step_size (float, optional) – set an initial step size for the fitting algorithm.
 robust (boolean, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator, aka WeiLin estimate. This does not handle ties, so if there are high number of ties, results may significantly differ. See “The Robust Inference for the Cox Proportional Hazards Model”, Journal of the American Statistical Association, Vol. 84, No. 408 (Dec., 1989), pp. 1074 1078
 cluster_col (string, optional) – specifies what column has unique identifers for clustering covariances. Using this forces the sandwich estimator (robust variance estimator) to be used.
 batch_mode (bool, optional) – enabling batch_mode can be faster for datasets with a large number of ties. If left as None, lifelines will choose the best option.
Returns: self – self with additional new properties:
print_summary
,hazards_
,confidence_intervals_
,baseline_survival_
, etc.Return type: Note
Tied survival times are handled using Efron’s tiemethod.
Examples
>>> from lifelines import CoxPHFitter >>> >>> df = pd.DataFrame({ >>> 'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], >>> 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], >>> 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> }) >>> >>> cph = CoxPHFitter() >>> cph.fit(df, 'T', 'E') >>> cph.print_summary() >>> cph.predict_median(df)
>>> from lifelines import CoxPHFitter >>> >>> df = pd.DataFrame({ >>> 'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], >>> 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], >>> 'weights': [1.1, 0.5, 2.0, 1.6, 1.2, 4.3, 1.4, 4.5, 3.0, 3.2, 0.4, 6.2], >>> 'month': [10, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> }) >>> >>> cph = CoxPHFitter() >>> cph.fit(df, 'T', 'E', strata=['month', 'age'], robust=True, weights_col='weights') >>> cph.print_summary() >>> cph.predict_median(df)

plot
(columns=None, display_significance_code=True, **errorbar_kwargs)¶ Produces a visual representation of the coefficients, including their standard errors and magnitudes.
Parameters:  columns (list, optional) – specify a subset of the columns to plot
 display_significance_code (bool, optional (default: True)) – display asteriks beside statistically significant variables
 errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis

plot_covariate_groups
(covariate, groups, **kwargs)¶ Produces a visual representation comparing the baseline survival curve of the model versus what happens when a covariate is varied over values in a group. This is useful to compare subjects’ survival as we vary a single covariate, all else being held equal. The baseline survival curve is equal to the predicted survival curve at all average values in the original dataset.
Parameters:  covariate (string) – a string of the covariate in the original dataset that we wish to vary.
 groups (iterable) – an iterable of the values we wish the covariate to take on.
 kwargs – pass in additional plotting commands
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis

predict_cumulative_hazard
(X, times=None)¶ Parameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index.
Returns: cumulative_hazard_ – the cumulative hazard of individuals over the timeline
Return type: DataFrame

predict_expectation
(X)¶ Compute the expected lifetime, \(E[T]\), using covarites X. This algorithm to compute the expection is to use the fact that \(E[T] = \int_0^\inf P(T > t) dt = \int_0^\inf S(t) dt\). To compute the integal, we use the trapizoidal rule to approximate the integral.
However, if the survival function doesn’t converge to 0, the the expectation is really infinity and the returned values are meaningless/too large. In that case, using
predict_median
orpredict_percentile
would be better.Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. Returns: expectations Return type: DataFrame Notes
If X is a dataframe, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.
See also

predict_log_partial_hazard
(X)¶ This is equivalent to R’s linear.predictors. Returns the log of the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to \(\beta (X  mean(X_{train}))\)
Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. Returns: log_partial_hazard Return type: DataFrame Notes
If X is a dataframe, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

predict_median
(X)¶ Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. Returns: percentiles – the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity. Return type: DataFrame See also

predict_partial_hazard
(X)¶ Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. Returns: partial_hazard – Returns the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to \(\exp{\beta (X  mean(X_{train}))}\) Return type: DataFrame Notes
If X is a dataframe, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

predict_percentile
(X, p=0.5)¶ Returns the median lifetimes for the individuals, by default. If the survival curve of an individual does not cross 0.5, then the result is infinity. http://stats.stackexchange.com/questions/102986/percentilelossfunctions
Parameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 p (float, optional (default=0.5)) – the percentile, must be between 0 and 1.
Returns: percentiles
Return type: DataFrame
See also

predict_survival_function
(X, times=None)¶ Predict the survival function for individuals, given their covariates. This assumes that the individual just entered the study (that is, we do not condition on how long they have already lived for.)
Parameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index.
Returns: survival_function – the survival probabilities of individuals over the timeline
Return type: DataFrame

print_summary
(decimals=2, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

score_
¶ The concordance score (also known as the cindex) of the fit. The cindex is a generalization of the AUC to survival data, including censorships.
For this purpose, the
score_
is a measure of the predictive accuracy of the fitted model onto the training dataset. It’s analgous to the R^2 in linear models.

summary
¶ Summary statistics describing the fit. Set alpha property in the object before calling.
Returns: df – Contains columns coef, exp(coef), se(coef), z, p, lower, upper Return type: DataFrame

class
lifelines.
WeibullFitter
(alpha=0.95)¶ Bases:
lifelines.fitters.UnivariateFitter
This class implements a Weibull model for univariate data. The model has parameterized form:
\[S(t) = exp((\lambda t)^\rho), \lambda > 0, \rho > 0,\]which implies the cumulative hazard rate is
\[H(t) = (\lambda t)^\rho,\]and the hazard rate is:
\[h(t) = \rho \lambda(\lambda t)^{\rho1}\]After calling the .fit method, you have access to properties like:
cumulative_hazard_
,survival_function_
,lambda_
andrho_
.A summary of the fit is available with the method ‘print_summary()’
Examples
>>> from lifelines import WeibullFitter >>> from lifelines.datasets import load_waltons >>> waltons = load_waltons() >>> wbf = WeibullFitter() >>> wbf.fit(waltons['T'], waltons['E']) >>> wbf.plot() >>> print(wbf.lambda_)

cumulative_hazard_at_times
(times)¶

fit
(durations, event_observed=None, timeline=None, label='Weibull_estimate', alpha=None, ci_labels=None, show_progress=False)¶ Parameters: duration (an array, or pd.Series) – length n, duration subject was observed for
event_observed (numpy array or pd.Series, optional) –
 length n, True if the the death was observed, False if the event
was lost (rightcensored). Defaults all True if event_observed==None
timeline (list, optional) – return the estimate at the values in timeline (postively increasing)
label (string, optional) – a string to name the column of the estimate.
alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
ci_labels (list, optional) –
 add custom column names to the generated confidence intervals
as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
show_progress (boolean, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
Returns: self – self with new properties like
cumulative_hazard_
,survival_function_
,lambda_
, andrho_
.Return type:

hazard_at_times
(times)¶

print_summary
(decimals=2, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

summary
¶ Summary statistics describing the fit. Set alpha property in the object before calling.
Returns: df – Contains columns coef, exp(coef), se(coef), z, p, lower, upper Return type: pd.DataFrame

survival_function_at_times
(times)¶


class
lifelines.
ExponentialFitter
(alpha=0.95)¶ Bases:
lifelines.fitters.UnivariateFitter
This class implements an Exponential model for univariate data. The model has parameterized form:
\[S(t) = exp((\lambda*t)), \lambda >0\]which implies the cumulative hazard rate is
\[H(t) = \lambda*t\]and the hazard rate is:
\[h(t) = \lambda\] After calling the .fit method, you have access to properties like:
 ‘survival_function_’, ‘lambda_’
A summary of the fit is available with the method ‘print_summary()’
Notes
Reference: https://www4.stat.ncsu.edu/~dzhang2/st745/chap3.pdf

fit
(durations, event_observed=None, timeline=None, label='Exponential_estimate', alpha=None, ci_labels=None)¶ Parameters: duration (iterable) – an array, or pd.Series, of length n – duration subject was observed for
event_observed (iterable, optional) –
 an array, list, or pd.Series, of length n – True if the the death was observed, False if the event
was lost (rightcensored). Defaults all True if event_observed==None
timeline (iterable, optional) – return the best estimate at the values in timelines (postively increasing)
label (string, optional) – a string to name the column of the estimate.
alpha (float, optional) –
 the alpha value in the confidence intervals. Overrides the initializing
alpha for this call to fit only.
ci_labels (list, optional) –
 add custom column names to the generated confidence intervals
as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
Returns: self – self, with new properties like ‘survival_function_’ and ‘lambda_’.
Return type: ExpontentialFitter

print_summary
(decimals=2, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

summary
¶ Summary statistics describing the fit. Set alpha property in the object before calling.
Returns: df – Contains columns coef, exp(coef), se(coef), z, p, lower, upper Return type: DataFrame

class
lifelines.
CoxTimeVaryingFitter
(alpha=0.95, penalizer=0.0, strata=None)¶ Bases:
lifelines.fitters.BaseFitter
This class implements fitting Cox’s timevarying proportional hazard model:
\[h(tx(t)) = h_0(t)*exp(x(t)'*beta)\]Parameters:  alpha (float, optional) – the level in the confidence intervals.
 penalizer (float, optional) – the coefficient of an l2 penalizer in the regression

fit
(df, id_col, event_col, start_col='start', stop_col='stop', weights_col=None, show_progress=False, step_size=None, robust=False, strata=None)¶ Fit the Cox Propertional Hazard model to a time varying dataset. Tied survival times are handled using Efron’s tiemethod.
Parameters:  df (DataFrame) – a Pandas dataframe with necessary columns duration_col and event_col, plus other covariates. duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 id_col (string) – A subject could have multiple rows in the dataframe. This column contains the unique identifer per subject.
 event_col (string) – the column in dataframe that contains the subjects’ death observation. If left as None, assume all individuals are noncensored.
 start_col (string) – the column that contains the start of a subject’s time period.
 stop_col (string) – the column that contains the end of a subject’s time period.
 weights_col (string, optional) – the column that contains (possibly timevarying) weight of each subjectperiod row.
 show_progress (since the fitter is iterative, show convergence) – diagnostics.
 robust (boolean, optional (default: True)) – Compute the robust errors using the Huber sandwich estimator, aka WeiLin estimate. This does not handle ties, so if there are high number of ties, results may significantly differ. See “The Robust Inference for the Cox Proportional Hazards Model”, Journal of the American Statistical Association, Vol. 84, No. 408 (Dec., 1989), pp. 1074 1078
 step_size (float, optional) – set an initial step size for the fitting algorithm.
 strata (TODO)
Returns: self – self, with additional properties like
hazards_
andprint_summary
Return type:

plot
(columns=None, display_significance_code=True, **errorbar_kwargs)¶ Produces a visual representation of the coefficients, including their standard errors and magnitudes.
Parameters:  columns (list, optional) – specifiy a subset of the columns to plot
 display_significance_code (bool, optional (default: True)) – display asteriks beside statistically significant variables
 errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis

predict_log_partial_hazard
(X)¶ This is equivalent to R’s linear.predictors. Returns the log of the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to \(\beta (X  \bar{X})\)
Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. Returns: Return type: DataFrame Note
If X is a dataframe, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

predict_partial_hazard
(X)¶ Returns the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to \(\exp{\beta (X  \bar{X})}\)
Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. Returns: Return type: DataFrame Note
If X is a dataframe, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

print_summary
(decimals=2, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

summary
¶ Summary statistics describing the fit. Set alpha property in the object before calling.
Returns: df – contains columns coef, exp(coef), se(coef), z, p, lower, upper Return type: DataFrame

class
lifelines.
AalenJohansenFitter
(jitter_level=0.0001, seed=None, alpha=0.95)¶ Bases:
lifelines.fitters.UnivariateFitter
Class for fitting the AalenJohansen estimate for the cumulative incidence function in a competing risks framework. Treating competing risks as censoring can result in overestimated cumulative density functions. Using the Kaplan Meier estimator with competing risks as censored is akin to estimating the cumulative density if all competing risks had been prevented. If you are interested in learning more, I (Paul Zivich) recommend the following openaccess paper; Edwards JK, Hester LL, Gokhale M, Lesko CR. Methodologic Issues When Estimating Risks in Pharmacoepidemiology. Curr Epidemiol Rep. 2016;3(4):285296.
AalenJohansenFitter(alpha=0.95, jitter_level=0.00001, seed=None)
AalenJohansen cannot deal with tied times. We can get around this by randomy jittering the event times slightly. This will be done automatically and generates a warning.

fit
(durations, event_observed, event_of_interest, timeline=None, entry=None, label='AJ_estimate', alpha=None, ci_labels=None, weights=None)¶ Parameters:  durations (an array or pd.Series of length n – duration of subject was observed for)
 event_observed (an array, or pd.Series, of length n. Integer indicator of distinct events. Must be) – only positive integers, where 0 indicates censoring.
 event_of_interest (integer – indicator for event of interest. All other integers are considered competing events) – Ex) event_observed contains 0, 1, 2 where 0:censored, 1:lung cancer, and 2:death. If event_of_interest=1, then death (2) is considered a competing event. The returned cumulative incidence function corresponds to risk of lung cancer
 timeline (return the best estimate at the values in timelines (postively increasing))
 entry (an array, or pd.Series, of length n – relative time when a subject entered the study. This is) – useful for lefttruncated (not leftcensored) observations. If None, all members of the population were born at time 0.
 label (a string to name the column of the estimate.)
 alpha (the alpha value in the confidence intervals. Overrides the initializing) – alpha for this call to fit only.
 ci_labels (add custom column names to the generated confidence intervals) – as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 weights (n array, or pd.Series, of length n, if providing a weighted dataset. For example, instead) – of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
Returns: self – self, with new properties like ‘cumulative_incidence_’.
Return type:
