Univariate models¶
AalenJohansenFitter¶

class
lifelines.fitters.aalen_johansen_fitter.
AalenJohansenFitter
(jitter_level=0.0001, seed=None, alpha=0.05, calculate_variance=True, **kwargs)¶ Bases:
lifelines.fitters.UnivariateFitter
Class for fitting the AalenJohansen estimate for the cumulative incidence function in a competing risks framework. Treating competing risks as censoring can result in overestimated cumulative density functions. Using the Kaplan Meier estimator with competing risks as censored is akin to estimating the cumulative density if all competing risks had been prevented.
AalenJohansen cannot deal with tied times. We can get around this by randomly jittering the event times slightly. This will be done automatically and generates a warning.
Parameters:  alpha (float, option (default=0.05)) – The alpha value associated with the confidence intervals.
 jitter_level (float, option (default=0.00001)) – If tied event times are detected, event times are randomly changed by this factor.
 seed (int, option (default=None)) – To produce replicate results with tied event times, the numpy.random.seed can be specified in the function.
 calculate_variance (bool, option (default=True)) – By default, AalenJohansenFitter calculates the variance and corresponding confidence intervals. Due to how the variance is calculated, the variance must be calculated for each event time individually. This is computationally intensive. For some procedures, like bootstrapping, the variance is not necessary. To reduce computation time during these procedures, calculate_variance can be set to False to skip the variance calculation.
Example
>>> from lifelines import AalenJohansenFitter >>> from lifelines.datasets import load_waltons >>> T, E = load_waltons()['T'], load_waltons()['E'] >>> ajf = AalenJohansenFitter(calculate_variance=True) >>> ajf.fit(T, E, event_of_interest=1) >>> ajf.cumulative_density_ >>> ajf.plot()
References
If you are interested in learning more, we recommend the following openaccess paper; Edwards JK, Hester LL, Gokhale M, Lesko CR. Methodologic Issues When Estimating Risks in Pharmacoepidemiology. Curr Epidemiol Rep. 2016;3(4):285296.

conditional_time_to_event_
¶ Return a DataFrame, with index equal to survival_function_, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

cumulative_density_at_times
(times, label=None)¶

cumulative_hazard_at_times
(times, label=None)¶

divide
(other) → pandas.core.frame.DataFrame¶ Divide the {0} of two {1} objects.
Parameters: other (same object as self)

fit
(durations, event_observed, event_of_interest, timeline=None, entry=None, label=None, alpha=None, ci_labels=None, weights=None)¶ Parameters:  durations (an array or pd.Series of length n – duration of subject was observed for)
 event_observed (an array, or pd.Series, of length n. Integer indicator of distinct events. Must be) – only positive integers, where 0 indicates censoring.
 event_of_interest (integer – indicator for event of interest. All other integers are considered competing events) – Ex) event_observed contains 0, 1, 2 where 0:censored, 1:lung cancer, and 2:death. If event_of_interest=1, then death (2) is considered a competing event. The returned cumulative incidence function corresponds to risk of lung cancer
 timeline (return the best estimate at the values in timelines (positively increasing))
 entry (an array, or pd.Series, of length n – relative time when a subject entered the study. This is) – useful for lefttruncated (not leftcensored) observations. If None, all members of the population were born at time 0.
 label (a string to name the column of the estimate.)
 alpha (the alpha value in the confidence intervals. Overrides the initializing) – alpha for this call to fit only.
 ci_labels (add custom column names to the generated confidence intervals) – as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<1alpha/2>
 weights (n array, or pd.Series, of length n, if providing a weighted dataset. For example, instead) – of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
Returns: self – self, with new properties like
cumulative_incidence_
.Return type:

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

hazard_at_times
(times, label=None)¶

median_survival_time_
¶ Return the unique time point, t, such that S(t) = 0.5. This is the “halflife” of the population, and a robust summary statistic for the population, if it exists.

percentile
(p: float) → float¶ Return the unique time point, t, such that S(t) = p.
Parameters: p (float) Note
For known parametric models, this should be overwritten by something more accurate.

plot
(**kwargs)¶ Plots a pretty figure of the model
Matplotlib plot arguments can be passed in inside the kwargs, plus
Parameters: show_censors (bool) – place markers at censorship events. Default: False
censor_styles (dict) – If show_censors, this dictionary will be passed into the plot call.
ci_alpha (float) – the transparency level of the confidence interval. Default: 0.3
ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False
ci_show (bool) – show confidence intervals. Default: True
ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False
at_risk_counts (bool) – show group sizes at time points. See function
add_at_risk_counts
for details. Default: Falseloc (slice) – specify a timebased subsection of the curves to plot, ex:
>>> model.plot(loc=slice(0.,10.))
will plot the time values between t=0. and t=10.
iloc (slice) – specify a locationbased subsection of the curves to plot, ex:
>>> model.plot(iloc=slice(0,10))
will plot the first 10 time points.
Returns: a pyplot axis object
Return type: ax

plot_cumulative_density
(**kwargs)¶

plot_cumulative_hazard
(**kwargs)¶

plot_hazard
(**kwargs)¶

plot_survival_function
(**kwargs)¶

predict
(times: Union[Iterable[float], float], interpolate=False) → pandas.core.series.Series¶ Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.
Parameters:  times (scalar, or array) – a scalar or an array of times to predict the value of {0} at.
 interpolate (bool, optional (default=False)) – for methods that produce a stepwise solution (KaplanMeier, NelsonAalen, etc), turning this to True will use an linear interpolation method to provide a more “smooth” answer.

subtract
(other) → pandas.core.frame.DataFrame¶ Subtract the {0} of two {1} objects.
Parameters: other (same object as self)

survival_function_at_times
(times, label=None)¶
BreslowFlemingHarringtonFitter¶

class
lifelines.fitters.breslow_fleming_harrington_fitter.
BreslowFlemingHarringtonFitter
(alpha: float = 0.05, label: str = None)¶ Bases:
lifelines.fitters.UnivariateFitter
Class for fitting the BreslowFlemingHarrington estimate for the survival function. This estimator is a biased estimator of the survival function but is more stable when the population is small and there are too few early truncation times, it may happen that is the number of patients at risk and the number of deaths is the same.
Mathematically, the NelsonAalen estimator is the negative logarithm of the BreslowFlemingHarrington estimator.
Parameters: alpha (float, optional (default=0.05)) – The alpha value associated with the confidence intervals. 
conditional_time_to_event_
¶ Return a DataFrame, with index equal to survival_function_, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

cumulative_density_at_times
(times, label=None)¶

cumulative_hazard_at_times
(times, label=None)¶

divide
(other) → pandas.core.frame.DataFrame¶ Divide the {0} of two {1} objects.
Parameters: other (same object as self)

fit
(durations, event_observed=None, timeline=None, entry=None, label=None, alpha=None, ci_labels=None, weights=None)¶ Parameters:  durations (an array, or pd.Series, of length n) – duration subject was observed for
 timeline – return the best estimate at the values in timelines (positively increasing)
 event_observed (an array, or pd.Series, of length n) – True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated observations, i.e the birth event was not observed. If None, defaults to all 0 (all birth events observed.)
 label (string) – a string to name the column of the estimate.
 alpha (float, optional (default=0.05)) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (iterable) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
Returns: Return type: self, with new properties like
survival_function_
.

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

hazard_at_times
(times, label=None)¶

median_survival_time_
¶ Return the unique time point, t, such that S(t) = 0.5. This is the “halflife” of the population, and a robust summary statistic for the population, if it exists.

percentile
(p: float) → float¶ Return the unique time point, t, such that S(t) = p.
Parameters: p (float) Note
For known parametric models, this should be overwritten by something more accurate.

plot
(**kwargs)¶ Plots a pretty figure of the model
Matplotlib plot arguments can be passed in inside the kwargs, plus
Parameters: show_censors (bool) – place markers at censorship events. Default: False
censor_styles (dict) – If show_censors, this dictionary will be passed into the plot call.
ci_alpha (float) – the transparency level of the confidence interval. Default: 0.3
ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False
ci_show (bool) – show confidence intervals. Default: True
ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False
at_risk_counts (bool) – show group sizes at time points. See function
add_at_risk_counts
for details. Default: Falseloc (slice) – specify a timebased subsection of the curves to plot, ex:
>>> model.plot(loc=slice(0.,10.))
will plot the time values between t=0. and t=10.
iloc (slice) – specify a locationbased subsection of the curves to plot, ex:
>>> model.plot(iloc=slice(0,10))
will plot the first 10 time points.
Returns: a pyplot axis object
Return type: ax

plot_cumulative_density
(**kwargs)¶

plot_cumulative_hazard
(**kwargs)¶

plot_hazard
(**kwargs)¶

plot_survival_function
(**kwargs)¶

predict
(times: Union[Iterable[float], float], interpolate=False) → pandas.core.series.Series¶ Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.
Parameters:  times (scalar, or array) – a scalar or an array of times to predict the value of {0} at.
 interpolate (bool, optional (default=False)) – for methods that produce a stepwise solution (KaplanMeier, NelsonAalen, etc), turning this to True will use an linear interpolation method to provide a more “smooth” answer.

subtract
(other) → pandas.core.frame.DataFrame¶ Subtract the {0} of two {1} objects.
Parameters: other (same object as self)

survival_function_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted survival value at specific times
Parameters:  times (iterable or float)
 label (str)

ExponentialFitter¶

class
lifelines.fitters.exponential_fitter.
ExponentialFitter
(*args, **kwargs)¶ Bases:
lifelines.fitters.KnownModelParametricUnivariateFitter
This class implements an Exponential model for univariate data. The model has parameterized form:
\[S(t) = \exp\left(\frac{t}{\lambda}\right), \lambda >0\]which implies the cumulative hazard rate is
\[H(t) = \frac{t}{\lambda}\]and the hazard rate is:
\[h(t) = \frac{1}{\lambda}\]After calling the
.fit
method, you have access to properties like:survival_function_
,lambda_
,cumulative_hazard_
A summary of the fit is available with the methodprint_summary()
Parameters: alpha (float, optional (default=0.05)) – the level in the confidence intervals. Important
The parameterization of this model changed in lifelines 0.19.0. Previously, the cumulative hazard looked like \(\lambda t\). The parameterization is now the reciprocal of \(\lambda\).

cumulative_hazard_
¶ The estimated cumulative hazard (with custom timeline if provided)
Type: DataFrame

confidence_interval_cumulative_hazard_
¶ The lower and upper confidence intervals for the cumulative hazard
Type: DataFrame

hazard_
¶ The estimated hazard (with custom timeline if provided)
Type: DataFrame

confidence_interval_hazard_
¶ The lower and upper confidence intervals for the hazard
Type: DataFrame

survival_function_
¶ The estimated survival function (with custom timeline if provided)
Type: DataFrame

confidence_interval_survival_function_
¶ The lower and upper confidence intervals for the survival function
Type: DataFrame

variance_matrix_
¶ The variance matrix of the coefficients
Type: numpy array

median_survival_time_
¶ The median time to event
Type: float

lambda_
¶ The fitted parameter in the model
Type: float

durations
¶ The durations provided
Type: array

event_observed
¶ The event_observed variable provided
Type: array

timeline
¶ The time line to use for plotting and indexing
Type: array

entry
¶ The entry array provided, or None
Type: array or None

cumumlative_density_
¶ The estimated cumulative density function (with custom timeline if provided)
Type: DataFrame

confidence_interval_cumulative_density_
¶ The lower and upper confidence intervals for the cumulative density
Type: DataFrame

conditional_time_to_event_
¶ Return a DataFrame, with index equal to survival_function_, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

confidence_interval_
¶ The confidence interval of the cumulative hazard. This is an alias for
confidence_interval_cumulative_hazard_
.

confidence_interval_cumulative_density_
The lower and upper confidence intervals for the cumulative density

confidence_interval_cumulative_hazard_
The confidence interval of the cumulative hazard. This is an alias for
confidence_interval_
.

confidence_interval_hazard_
The confidence interval of the hazard.

confidence_interval_survival_function_
The lower and upper confidence intervals for the survival function

cumulative_density_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted cumulative density function (1survival function) at specific times.
Parameters:  times (iterable or float) – values to return the survival function at.
 label (string, optional) – Rename the series returned. Useful for plotting.

cumulative_hazard_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted cumulative hazard value at specific times.
Parameters:  times (iterable or float) – values to return the cumulative hazard at.
 label (string, optional) – Rename the series returned. Useful for plotting.

divide
(other) → pandas.core.frame.DataFrame¶ Divide the {0} of two {1} objects.
Parameters: other (same object as self)

event_table
¶

fit
(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Parameters:  durations (an array, or pd.Series) – length n, duration subject was observed for
 event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_interval_censoring
(lower_bound, upper_bound, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Fit the model to an interval censored dataset.
Parameters:  lower_bound (an array, or pd.Series) – length n, the start of the period the subject experienced the event in.
 upper_bound (an array, or pd.Series) – length n, the end of the period the subject experienced the event in. If the value is equal to the corresponding value in lower_bound, then the individual’s event was observed (not censored).
 event_observed (numpy array or pd.Series, optional) – length n, if left optional, infer from
lower_bound
andupper_cound
(if lower_bound==upper_bound then event observed, if lower_bound < upper_bound, then event censored)  timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_left_censoring
(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Fit the model to a leftcensored dataset
Parameters:  durations (an array, or pd.Series) – length n, duration subject was observed for
 event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

hazard_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted hazard at specific times.
Parameters:  times (iterable or float) – values to return the hazard at.
 label (string, optional) – Rename the series returned. Useful for plotting.

median_survival_time_
Return the unique time point, t, such that S(t) = 0.5. This is the “halflife” of the population, and a robust summary statistic for the population, if it exists.

percentile
(p)¶ Return the unique time point, t, such that S(t) = p.
Parameters: p (float) Note
For known parametric models, this should be overwritten by something more accurate.

plot
(**kwargs)¶ Produce a prettyplot of the estimate.

plot_cumulative_density
(**kwargs)¶

plot_cumulative_hazard
(**kwargs)¶

plot_hazard
(**kwargs)¶

plot_survival_function
(**kwargs)¶

predict
(times: Union[Iterable[float], float], interpolate=False) → pandas.core.series.Series¶ Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.
Parameters:  times (scalar, or array) – a scalar or an array of times to predict the value of {0} at.
 interpolate (bool, optional (default=False)) – for methods that produce a stepwise solution (KaplanMeier, NelsonAalen, etc), turning this to True will use an linear interpolation method to provide a more “smooth” answer.

print_summary
(decimals=2, style=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

subtract
(other) → pandas.core.frame.DataFrame¶ Subtract the {0} of two {1} objects.
Parameters: other (same object as self)

summary
¶ Summary statistics describing the fit.
See also
print_summary

survival_function_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted survival value at specific times.
Parameters:  times (iterable or float) – values to return the survival function at.
 label (string, optional) – Rename the series returned. Useful for plotting.

GeneralizedGammaFitter¶

class
lifelines.fitters.generalized_gamma_fitter.
GeneralizedGammaFitter
(*args, **kwargs)¶ Bases:
lifelines.fitters.KnownModelParametricUnivariateFitter
This class implements a Generalized Gamma model for univariate data. The model has parameterized form:
The survival function is:
\[\begin{split}S(t)=\left\{ \begin{array}{} 1\Gamma_{RL}\left( \frac{1}{{{\lambda }^{2}}};\frac{{e}^{\lambda \left( \frac{\log(t)\mu }{\sigma} \right)}}{\lambda ^{2}} \right) \textit{ if } \lambda> 0 \\ \Gamma_{RL}\left( \frac{1}{{{\lambda }^{2}}};\frac{{e}^{\lambda \left( \frac{\log(t)\mu }{\sigma} \right)}}{\lambda ^{2}} \right) \textit{ if } \lambda \le 0 \\ \end{array} \right.\,\!\end{split}\]where \(\Gamma_{RL}\) is the regularized lower incomplete Gamma function.
This model has the Exponential, Weibull, Gamma and LogNormal as submodels, and thus can be used as a way to test which model to use:
 When \(\lambda = 1\) and \(\sigma = 1\), then the data is Exponential.
 When \(\lambda = 1\) then the data is Weibull.
 When \(\sigma = \lambda\) then the data is Gamma.
 When \(\lambda = 0\) then the data is LogNormal.
 When \(\lambda = 1\) then the data is InverseWeibull.
 When \(\sigma = \lambda\) then the data is InverseGamma.
After calling the
.fit
method, you have access to properties like:cumulative_hazard_
,survival_function_
, A summary of the fit is available with the methodprint_summary()
.Important
The parameterization implemented has \(\log\sigma\), thus there is a ln_sigma_ in the output. Exponentiate this parameter to recover \(\sigma\).
Important
This model is experimental. It’s API may change in the future. Also, it’s convergence is not very stable.
Parameters: alpha (float, optional (default=0.05)) – the level in the confidence intervals. Examples
>>> from lifelines import GeneralizedGammaFitter >>> from lifelines.datasets import load_waltons >>> waltons = load_waltons() >>> ggf = GeneralizedGammaFitter() >>> ggf.fit(waltons['T'], waltons['E']) >>> ggf.plot() >>> ggf.summary

cumulative_hazard_
¶ The estimated cumulative hazard (with custom timeline if provided)
Type: DataFrame

hazard_
¶ The estimated hazard (with custom timeline if provided)
Type: DataFrame

survival_function_
¶ The estimated survival function (with custom timeline if provided)
Type: DataFrame

cumumlative_density_
¶ The estimated cumulative density function (with custom timeline if provided)
Type: DataFrame

variance_matrix_
¶ The variance matrix of the coefficients
Type: numpy array

median_survival_time_
¶ The median time to event
Type: float

lambda_
¶ The fitted parameter in the model
Type: float

rho_
¶ The fitted parameter in the model
Type: float

alpha_
¶ The fitted parameter in the model
Type: float

durations
¶ The durations provided
Type: array

event_observed
¶ The event_observed variable provided
Type: array

timeline
¶ The time line to use for plotting and indexing
Type: array

entry
¶ The entry array provided, or None
Type: array or None

conditional_time_to_event_
¶ Return a DataFrame, with index equal to survival_function_, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

confidence_interval_
¶ The confidence interval of the cumulative hazard. This is an alias for
confidence_interval_cumulative_hazard_
.

confidence_interval_cumulative_density_
¶ The lower and upper confidence intervals for the cumulative density

confidence_interval_cumulative_hazard_
¶ The confidence interval of the cumulative hazard. This is an alias for
confidence_interval_
.

confidence_interval_hazard_
¶ The confidence interval of the hazard.

confidence_interval_survival_function_
¶ The lower and upper confidence intervals for the survival function

cumulative_density_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted cumulative density function (1survival function) at specific times.
Parameters:  times (iterable or float) – values to return the survival function at.
 label (string, optional) – Rename the series returned. Useful for plotting.

cumulative_hazard_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted cumulative hazard value at specific times.
Parameters:  times (iterable or float) – values to return the cumulative hazard at.
 label (string, optional) – Rename the series returned. Useful for plotting.

divide
(other) → pandas.core.frame.DataFrame¶ Divide the {0} of two {1} objects.
Parameters: other (same object as self)

event_table
¶

fit
(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Parameters:  durations (an array, or pd.Series) – length n, duration subject was observed for
 event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_interval_censoring
(lower_bound, upper_bound, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Fit the model to an interval censored dataset.
Parameters:  lower_bound (an array, or pd.Series) – length n, the start of the period the subject experienced the event in.
 upper_bound (an array, or pd.Series) – length n, the end of the period the subject experienced the event in. If the value is equal to the corresponding value in lower_bound, then the individual’s event was observed (not censored).
 event_observed (numpy array or pd.Series, optional) – length n, if left optional, infer from
lower_bound
andupper_cound
(if lower_bound==upper_bound then event observed, if lower_bound < upper_bound, then event censored)  timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_left_censoring
(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Fit the model to a leftcensored dataset
Parameters:  durations (an array, or pd.Series) – length n, duration subject was observed for
 event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

hazard_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted hazard at specific times.
Parameters:  times (iterable or float) – values to return the hazard at.
 label (string, optional) – Rename the series returned. Useful for plotting.

median_survival_time_
Return the unique time point, t, such that S(t) = 0.5. This is the “halflife” of the population, and a robust summary statistic for the population, if it exists.

percentile
(p)¶ Return the unique time point, t, such that S(t) = p.
Parameters: p (float) Note
For known parametric models, this should be overwritten by something more accurate.

plot
(**kwargs)¶ Produce a prettyplot of the estimate.

plot_cumulative_density
(**kwargs)¶

plot_cumulative_hazard
(**kwargs)¶

plot_hazard
(**kwargs)¶

plot_survival_function
(**kwargs)¶

predict
(times: Union[Iterable[float], float], interpolate=False) → pandas.core.series.Series¶ Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.
Parameters:  times (scalar, or array) – a scalar or an array of times to predict the value of {0} at.
 interpolate (bool, optional (default=False)) – for methods that produce a stepwise solution (KaplanMeier, NelsonAalen, etc), turning this to True will use an linear interpolation method to provide a more “smooth” answer.

print_summary
(decimals=2, style=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

subtract
(other) → pandas.core.frame.DataFrame¶ Subtract the {0} of two {1} objects.
Parameters: other (same object as self)

summary
¶ Summary statistics describing the fit.
See also
print_summary

survival_function_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted survival value at specific times.
Parameters:  times (iterable or float) – values to return the survival function at.
 label (string, optional) – Rename the series returned. Useful for plotting.
KaplanMeierFitter¶

class
lifelines.fitters.kaplan_meier_fitter.
KaplanMeierFitter
(alpha: float = 0.05, label: str = None)¶ Bases:
lifelines.fitters.UnivariateFitter
Class for fitting the KaplanMeier estimate for the survival function.
Parameters:  alpha (float, optional (default=0.05)) – The alpha value associated with the confidence intervals.
 label (string, optional) – Provide a new label for the estimate  useful if looking at many groups.
Examples
>>> from lifelines import KaplanMeierFitter >>> from lifelines.datasets import load_waltons >>> waltons = load_waltons() >>> kmf = KaplanMeierFitter(label="waltons_data") >>> kmf.fit(waltons['T'], waltons['E']) >>> kmf.plot()

survival_function_
¶ The estimated survival function (with custom timeline if provided)
Type: DataFrame

median_survival_time_
¶ The estimated median time to event. np.inf if doesn’t exist.
Type: float

confidence_interval_
¶ The lower and upper confidence intervals for the survival function. An alias of
confidence_interval_survival_function_
. Uses Greenwood’s Exponential formula (“loglog” in R).Type: DataFrame

confidence_interval_survival_function_
¶ The lower and upper confidence intervals for the survival function. An alias of
confidence_interval_
. Uses Greenwood’s Exponential formula (“loglog” in R).Type: DataFrame

cumumlative_density_
¶ The estimated cumulative density function (with custom timeline if provided)
Type: DataFrame

confidence_interval_cumulative_density_
¶ The lower and upper confidence intervals for the cumulative density.
Type: DataFrame

durations
¶ The durations provided
Type: array

event_observed
¶ The event_observed variable provided
Type: array

timeline
¶ The time line to use for plotting and indexing
Type: array

entry
¶ The entry array provided, or None
Type: array or None

event_table
¶ A summary of the life table
Type: DataFrame

conditional_time_to_event_
¶ Return a DataFrame, with index equal to survival_function_, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

cumulative_density_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted cumulative density at specific times
Parameters: times (iterable or float) Returns: Return type: pd.Series

cumulative_hazard_at_times
(times, label=None)¶

divide
(other) → pandas.core.frame.DataFrame¶ Divide the {0} of two {1} objects.
Parameters: other (same object as self)

fit
(durations, event_observed=None, timeline=None, entry=None, label=None, alpha=None, ci_labels=None, weights=None)¶ Fit the model to a rightcensored dataset
Parameters:  durations (an array, list, pd.DataFrame or pd.Series) – length n – duration subject was observed for
 event_observed (an array, list, pd.DataFrame, or pd.Series, optional) – True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 timeline (an array, list, pd.DataFrame, or pd.Series, optional) – return the best estimate at the values in timelines (positively increasing)
 entry (an array, list, pd.DataFrame, or pd.Series, optional) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”.
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (tuple, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<1alpha/2>
 weights (an array, list, pd.DataFrame, or pd.Series, optional) – if providing a weighted dataset. For example, instead of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
Returns: self – self with new properties like
survival_function_
,plot()
,median_survival_time_
Return type:

fit_left_censoring
(durations, event_observed=None, timeline=None, entry=None, label=None, alpha=None, ci_labels=None, weights=None)¶ Fit the model to a leftcensored dataset
Parameters:  durations (an array, list, pd.DataFrame or pd.Series) – length n – duration subject was observed for
 event_observed (an array, list, pd.DataFrame, or pd.Series, optional) – True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 timeline (an array, list, pd.DataFrame, or pd.Series, optional) – return the best estimate at the values in timelines (positively increasing)
 entry (an array, list, pd.DataFrame, or pd.Series, optional) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”.
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (tuple, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<1alpha/2>
 weights (an array, list, pd.DataFrame, or pd.Series, optional) – if providing a weighted dataset. For example, instead of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
Returns: self – self with new properties like
survival_function_
,plot()
,median_survival_time_
Return type:

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

hazard_at_times
(times, label=None)¶

median_survival_time_
Return the unique time point, t, such that S(t) = 0.5. This is the “halflife” of the population, and a robust summary statistic for the population, if it exists.

percentile
(p: float) → float¶ Return the unique time point, t, such that S(t) = p.
Parameters: p (float) Note
For known parametric models, this should be overwritten by something more accurate.

plot
(**kwargs)¶ Plots a pretty figure of the model
Matplotlib plot arguments can be passed in inside the kwargs, plus
Parameters: show_censors (bool) – place markers at censorship events. Default: False
censor_styles (dict) – If show_censors, this dictionary will be passed into the plot call.
ci_alpha (float) – the transparency level of the confidence interval. Default: 0.3
ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False
ci_show (bool) – show confidence intervals. Default: True
ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False
at_risk_counts (bool) – show group sizes at time points. See function
add_at_risk_counts
for details. Default: Falseloc (slice) – specify a timebased subsection of the curves to plot, ex:
>>> model.plot(loc=slice(0.,10.))
will plot the time values between t=0. and t=10.
iloc (slice) – specify a locationbased subsection of the curves to plot, ex:
>>> model.plot(iloc=slice(0,10))
will plot the first 10 time points.
Returns: a pyplot axis object
Return type: ax

plot_cumulative_density
(**kwargs)¶ Plots a pretty figure of the cumulative density function.
Matplotlib plot arguments can be passed in inside the kwargs.
Parameters: show_censors (bool) – place markers at censorship events. Default: False
censor_styles (bool) – If show_censors, this dictionary will be passed into the plot call.
ci_alpha (bool) – the transparency level of the confidence interval. Default: 0.3
ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False
ci_show (bool) – show confidence intervals. Default: True
ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False
at_risk_counts (bool) – show group sizes at time points. See function
add_at_risk_counts
for details. Default: Falseloc (slice) – specify a timebased subsection of the curves to plot, ex:
>>> model.plot(loc=slice(0.,10.))
will plot the time values between t=0. and t=10.
iloc (slice) – specify a locationbased subsection of the curves to plot, ex:
>>> model.plot(iloc=slice(0,10))
will plot the first 10 time points.
Returns: a pyplot axis object
Return type: ax

plot_cumulative_hazard
(**kwargs)¶

plot_hazard
(**kwargs)¶

plot_loglogs
(*args, **kwargs)¶ Plot \(\log(S(t))\) against \(\log(t)\). Same arguments as
.plot
.

plot_survival_function
(**kwargs)¶ Alias of
plot

predict
(times: Union[Iterable[float], float], interpolate=False) → pandas.core.series.Series¶ Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.
Parameters:  times (scalar, or array) – a scalar or an array of times to predict the value of {0} at.
 interpolate (bool, optional (default=False)) – for methods that produce a stepwise solution (KaplanMeier, NelsonAalen, etc), turning this to True will use an linear interpolation method to provide a more “smooth” answer.

subtract
(other) → pandas.core.frame.DataFrame¶ Subtract the {0} of two {1} objects.
Parameters: other (same object as self)

survival_function_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted survival value at specific times
Parameters: times (iterable or float) Returns: Return type: pd.Series
LogLogisticFitter¶

class
lifelines.fitters.log_logistic_fitter.
LogLogisticFitter
(*args, **kwargs)¶ Bases:
lifelines.fitters.KnownModelParametricUnivariateFitter
This class implements a LogLogistic model for univariate data. The model has parameterized form:
\[S(t) = \left(1 + \left(\frac{t}{\alpha}\right)^{\beta}\right)^{1}, \alpha > 0, \beta > 0,\]The \(\alpha\) (scale) parameter has an interpretation as being equal to the median lifetime of the population. The \(\beta\) parameter influences the shape of the hazard. See figure below:
The hazard rate is:
\[h(t) = \frac{\left(\frac{\beta}{\alpha}\right)\left(\frac{t}{\alpha}\right) ^ {\beta1}}{\left(1 + \left(\frac{t}{\alpha}\right)^{\beta}\right)}\]and the cumulative hazard is:
\[H(t) = \log\left(\left(\frac{t}{\alpha}\right) ^ {\beta} + 1\right)\]After calling the
.fit
method, you have access to properties like:cumulative_hazard_
,plot
,survival_function_
,alpha_
andbeta_
. A summary of the fit is available with the method ‘print_summary()’Parameters: alpha (float, optional (default=0.05)) – the level in the confidence intervals. Examples
>>> from lifelines import LogLogisticFitter >>> from lifelines.datasets import load_waltons >>> waltons = load_waltons() >>> llf = LogLogisticFitter() >>> llf.fit(waltons['T'], waltons['E']) >>> llf.plot() >>> print(llf.alpha_)

cumulative_hazard_
¶ The estimated cumulative hazard (with custom timeline if provided)
Type: DataFrame

hazard_
¶ The estimated hazard (with custom timeline if provided)
Type: DataFrame

survival_function_
¶ The estimated survival function (with custom timeline if provided)
Type: DataFrame

cumumlative_density_
¶ The estimated cumulative density function (with custom timeline if provided)
Type: DataFrame

variance_matrix_
¶ The variance matrix of the coefficients
Type: numpy array

median_survival_time_
¶ The median time to event
Type: float

alpha_
¶ The fitted parameter in the model
Type: float

beta_
¶ The fitted parameter in the model
Type: float

durations
¶ The durations provided
Type: array

event_observed
¶ The event_observed variable provided
Type: array

timeline
¶ The time line to use for plotting and indexing
Type: array

entry
¶ The entry array provided, or None
Type: array or None

conditional_time_to_event_
¶ Return a DataFrame, with index equal to survival_function_, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

confidence_interval_
¶ The confidence interval of the cumulative hazard. This is an alias for
confidence_interval_cumulative_hazard_
.

confidence_interval_cumulative_density_
¶ The lower and upper confidence intervals for the cumulative density

confidence_interval_cumulative_hazard_
¶ The confidence interval of the cumulative hazard. This is an alias for
confidence_interval_
.

confidence_interval_hazard_
¶ The confidence interval of the hazard.

confidence_interval_survival_function_
¶ The lower and upper confidence intervals for the survival function

cumulative_density_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted cumulative density function (1survival function) at specific times.
Parameters:  times (iterable or float) – values to return the survival function at.
 label (string, optional) – Rename the series returned. Useful for plotting.

cumulative_hazard_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted cumulative hazard value at specific times.
Parameters:  times (iterable or float) – values to return the cumulative hazard at.
 label (string, optional) – Rename the series returned. Useful for plotting.

divide
(other) → pandas.core.frame.DataFrame¶ Divide the {0} of two {1} objects.
Parameters: other (same object as self)

event_table
¶

fit
(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Parameters:  durations (an array, or pd.Series) – length n, duration subject was observed for
 event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_interval_censoring
(lower_bound, upper_bound, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Fit the model to an interval censored dataset.
Parameters:  lower_bound (an array, or pd.Series) – length n, the start of the period the subject experienced the event in.
 upper_bound (an array, or pd.Series) – length n, the end of the period the subject experienced the event in. If the value is equal to the corresponding value in lower_bound, then the individual’s event was observed (not censored).
 event_observed (numpy array or pd.Series, optional) – length n, if left optional, infer from
lower_bound
andupper_cound
(if lower_bound==upper_bound then event observed, if lower_bound < upper_bound, then event censored)  timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_left_censoring
(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Fit the model to a leftcensored dataset
Parameters:  durations (an array, or pd.Series) – length n, duration subject was observed for
 event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

hazard_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted hazard at specific times.
Parameters:  times (iterable or float) – values to return the hazard at.
 label (string, optional) – Rename the series returned. Useful for plotting.

median_survival_time_
Return the unique time point, t, such that S(t) = 0.5. This is the “halflife” of the population, and a robust summary statistic for the population, if it exists.

percentile
(p)¶ Return the unique time point, t, such that S(t) = p.
Parameters: p (float) Note
For known parametric models, this should be overwritten by something more accurate.

plot
(**kwargs)¶ Produce a prettyplot of the estimate.

plot_cumulative_density
(**kwargs)¶

plot_cumulative_hazard
(**kwargs)¶

plot_hazard
(**kwargs)¶

plot_survival_function
(**kwargs)¶

predict
(times: Union[Iterable[float], float], interpolate=False) → pandas.core.series.Series¶ Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.
Parameters:  times (scalar, or array) – a scalar or an array of times to predict the value of {0} at.
 interpolate (bool, optional (default=False)) – for methods that produce a stepwise solution (KaplanMeier, NelsonAalen, etc), turning this to True will use an linear interpolation method to provide a more “smooth” answer.

print_summary
(decimals=2, style=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

subtract
(other) → pandas.core.frame.DataFrame¶ Subtract the {0} of two {1} objects.
Parameters: other (same object as self)

summary
¶ Summary statistics describing the fit.
See also
print_summary

survival_function_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted survival value at specific times.
Parameters:  times (iterable or float) – values to return the survival function at.
 label (string, optional) – Rename the series returned. Useful for plotting.

LogNormalFitter¶

class
lifelines.fitters.log_normal_fitter.
LogNormalFitter
(*args, **kwargs)¶ Bases:
lifelines.fitters.KnownModelParametricUnivariateFitter
This class implements an Log Normal model for univariate data. The model has parameterized form:
\[S(t) = 1  \Phi\left(\frac{\log(t)  \mu}{\sigma}\right), \;\; \sigma >0\]where \(\Phi\) is the CDF of a standard normal random variable. This implies the cumulative hazard rate is
\[H(t) = \log\left(1  \Phi\left(\frac{\log(t)  \mu}{\sigma}\right)\right)\]After calling the
.fit
method, you have access to properties like:survival_function_
,mu_
,sigma_
. A summary of the fit is available with the methodprint_summary()
Parameters: alpha (float, optional (default=0.05)) – the level in the confidence intervals. 
cumulative_hazard_
¶ The estimated cumulative hazard (with custom timeline if provided)
Type: DataFrame

hazard_
¶ The estimated hazard (with custom timeline if provided)
Type: DataFrame

survival_function_
¶ The estimated survival function (with custom timeline if provided)
Type: DataFrame

cumumlative_density_
¶ The estimated cumulative density function (with custom timeline if provided)
Type: DataFrame

variance_matrix_
¶ The variance matrix of the coefficients
Type: numpy array

median_survival_time_
¶ The median time to event
Type: float

mu_
¶ The fitted parameter in the model
Type: float

sigma_
¶ The fitted parameter in the model
Type: float

durations
¶ The durations provided
Type: array

event_observed
¶ The event_observed variable provided
Type: array

timeline
¶ The time line to use for plotting and indexing
Type: array

entry
¶ The entry array provided, or None
Type: array or None

conditional_time_to_event_
¶ Return a DataFrame, with index equal to survival_function_, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

confidence_interval_
¶ The confidence interval of the cumulative hazard. This is an alias for
confidence_interval_cumulative_hazard_
.

confidence_interval_cumulative_density_
¶ The lower and upper confidence intervals for the cumulative density

confidence_interval_cumulative_hazard_
¶ The confidence interval of the cumulative hazard. This is an alias for
confidence_interval_
.

confidence_interval_hazard_
¶ The confidence interval of the hazard.

confidence_interval_survival_function_
¶ The lower and upper confidence intervals for the survival function

cumulative_density_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted cumulative density function (1survival function) at specific times.
Parameters:  times (iterable or float) – values to return the survival function at.
 label (string, optional) – Rename the series returned. Useful for plotting.

cumulative_hazard_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted cumulative hazard value at specific times.
Parameters:  times (iterable or float) – values to return the cumulative hazard at.
 label (string, optional) – Rename the series returned. Useful for plotting.

divide
(other) → pandas.core.frame.DataFrame¶ Divide the {0} of two {1} objects.
Parameters: other (same object as self)

event_table
¶

fit
(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Parameters:  durations (an array, or pd.Series) – length n, duration subject was observed for
 event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_interval_censoring
(lower_bound, upper_bound, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Fit the model to an interval censored dataset.
Parameters:  lower_bound (an array, or pd.Series) – length n, the start of the period the subject experienced the event in.
 upper_bound (an array, or pd.Series) – length n, the end of the period the subject experienced the event in. If the value is equal to the corresponding value in lower_bound, then the individual’s event was observed (not censored).
 event_observed (numpy array or pd.Series, optional) – length n, if left optional, infer from
lower_bound
andupper_cound
(if lower_bound==upper_bound then event observed, if lower_bound < upper_bound, then event censored)  timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_left_censoring
(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Fit the model to a leftcensored dataset
Parameters:  durations (an array, or pd.Series) – length n, duration subject was observed for
 event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

hazard_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted hazard at specific times.
Parameters:  times (iterable or float) – values to return the hazard at.
 label (string, optional) – Rename the series returned. Useful for plotting.

median_survival_time_
Return the unique time point, t, such that S(t) = 0.5. This is the “halflife” of the population, and a robust summary statistic for the population, if it exists.

percentile
(p)¶ Return the unique time point, t, such that S(t) = p.
Parameters: p (float) Note
For known parametric models, this should be overwritten by something more accurate.

plot
(**kwargs)¶ Produce a prettyplot of the estimate.

plot_cumulative_density
(**kwargs)¶

plot_cumulative_hazard
(**kwargs)¶

plot_hazard
(**kwargs)¶

plot_survival_function
(**kwargs)¶

predict
(times: Union[Iterable[float], float], interpolate=False) → pandas.core.series.Series¶ Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.
Parameters:  times (scalar, or array) – a scalar or an array of times to predict the value of {0} at.
 interpolate (bool, optional (default=False)) – for methods that produce a stepwise solution (KaplanMeier, NelsonAalen, etc), turning this to True will use an linear interpolation method to provide a more “smooth” answer.

print_summary
(decimals=2, style=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

subtract
(other) → pandas.core.frame.DataFrame¶ Subtract the {0} of two {1} objects.
Parameters: other (same object as self)

summary
¶ Summary statistics describing the fit.
See also
print_summary

survival_function_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted survival value at specific times.
Parameters:  times (iterable or float) – values to return the survival function at.
 label (string, optional) – Rename the series returned. Useful for plotting.

NelsonAalenFitter¶

class
lifelines.fitters.nelson_aalen_fitter.
NelsonAalenFitter
(alpha=0.05, nelson_aalen_smoothing=True, **kwargs)¶ Bases:
lifelines.fitters.UnivariateFitter
Class for fitting the NelsonAalen estimate for the cumulative hazard.
NelsonAalenFitter(alpha=0.05, nelson_aalen_smoothing=True)
Parameters:  alpha (float, optional (default=0.05)) – The alpha value associated with the confidence intervals.
 nelson_aalen_smoothing (bool, optional) – If the event times are naturally discrete (like discrete years, minutes, etc.) then it is advisable to turn this parameter to False. See [1], pg.84.
Notes
[1] Aalen, O., Borgan, O., Gjessing, H., 2008. Survival and Event History Analysis

cumulative_hazard_
¶ The estimated cumulative hazard (with custom timeline if provided)
Type: DataFrame

confidence_interval_
¶ The lower and upper confidence intervals for the cumulative hazard
Type: DataFrame

durations
¶ The durations provided
Type: array

event_observed
¶ The event_observed variable provided
Type: array

timeline
¶ The time line to use for plotting and indexing
Type: array

entry
¶ The entry array provided, or None
Type: array or None

event_table
¶ A summary of the life table
Type: DataFrame

conditional_time_to_event_
¶ Return a DataFrame, with index equal to survival_function_, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

cumulative_density_at_times
(times, label=None)¶

cumulative_hazard_at_times
(times, label=None)¶

divide
(other) → pandas.core.frame.DataFrame¶ Divide the {0} of two {1} objects.
Parameters: other (same object as self)

fit
(durations, event_observed=None, timeline=None, entry=None, label=None, alpha=None, ci_labels=None, weights=None)¶ Parameters:  durations (an array, or pd.Series, of length n) – duration subject was observed for
 timeline (iterable) – return the best estimate at the values in timelines (positively increasing)
 event_observed (an array, or pd.Series, of length n) – True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated observations, i.e the birth event was not observed. If None, defaults to all 0 (all birth events observed.)
 label (string) – a string to name the column of the estimate.
 alpha (float) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (iterable) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<1alpha/2>
 weights (n array, or pd.Series, of length n) – if providing a weighted dataset. For example, instead of providing every subject as a single element of durations and event_observed, one could weigh subject differently.
Returns: Return type: self, with new properties like
cumulative_hazard_
.

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

hazard_at_times
(times, label=None)¶

median_survival_time_
¶ Return the unique time point, t, such that S(t) = 0.5. This is the “halflife” of the population, and a robust summary statistic for the population, if it exists.

percentile
(p)¶ Return the unique time point, t, such that S(t) = p.
Parameters: p (float) Note
For known parametric models, this should be overwritten by something more accurate.

plot
(**kwargs)¶ Plots a pretty figure of the model
Matplotlib plot arguments can be passed in inside the kwargs, plus
Parameters: show_censors (bool) – place markers at censorship events. Default: False
censor_styles (dict) – If show_censors, this dictionary will be passed into the plot call.
ci_alpha (float) – the transparency level of the confidence interval. Default: 0.3
ci_force_lines (bool) – force the confidence intervals to be line plots (versus default shaded areas). Default: False
ci_show (bool) – show confidence intervals. Default: True
ci_legend (bool) – if ci_force_lines is True, this is a boolean flag to add the lines’ labels to the legend. Default: False
at_risk_counts (bool) – show group sizes at time points. See function
add_at_risk_counts
for details. Default: Falseloc (slice) – specify a timebased subsection of the curves to plot, ex:
>>> model.plot(loc=slice(0.,10.))
will plot the time values between t=0. and t=10.
iloc (slice) – specify a locationbased subsection of the curves to plot, ex:
>>> model.plot(iloc=slice(0,10))
will plot the first 10 time points.
Returns: a pyplot axis object
Return type: ax

plot_cumulative_density
(**kwargs)¶

plot_cumulative_hazard
(**kwargs)¶

plot_hazard
(**kwargs)¶

plot_survival_function
(**kwargs)¶

predict
(times: Union[Iterable[float], float], interpolate=False) → pandas.core.series.Series¶ Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.
Parameters:  times (scalar, or array) – a scalar or an array of times to predict the value of {0} at.
 interpolate (bool, optional (default=False)) – for methods that produce a stepwise solution (KaplanMeier, NelsonAalen, etc), turning this to True will use an linear interpolation method to provide a more “smooth” answer.

smoothed_hazard_
(bandwidth)¶ Parameters: bandwidth (float) – the bandwith used in the Epanechnikov kernel. Returns: a DataFrame of the smoothed hazard Return type: DataFrame

smoothed_hazard_confidence_intervals_
(bandwidth, hazard_=None)¶ Parameters:  bandwidth (float) – the bandwidth to use in the Epanechnikov kernel. > 0
 hazard_ (numpy array) – a computed (n,) numpy array of estimated hazard rates. If none, uses
smoothed_hazard_

subtract
(other) → pandas.core.frame.DataFrame¶ Subtract the {0} of two {1} objects.
Parameters: other (same object as self)

survival_function_at_times
(times, label=None)¶
PiecewiseExponentialFitter¶

class
lifelines.fitters.piecewise_exponential_fitter.
PiecewiseExponentialFitter
(breakpoints, *args, **kwargs)¶ Bases:
lifelines.fitters.KnownModelParametricUnivariateFitter
This class implements an Piecewise Exponential model for univariate data. The model has parameterized hazard rate:
\[\begin{split}h(t) = \begin{cases} 1/\lambda_0 & \text{if $t \le \tau_0$} \\ 1/\lambda_1 & \text{if $\tau_0 < t \le \tau_1$} \\ 1/\lambda_2 & \text{if $\tau_1 < t \le \tau_2$} \\ ... \end{cases}\end{split}\]You specify the breakpoints, \(\tau_i\), and lifelines will find the optional values for the parameters.
After calling the
.fit
method, you have access to properties like:survival_function_
,plot
,cumulative_hazard_
A summary of the fit is available with the methodprint_summary()
Parameters:  breakpoints (list) – a list of times when a new exponential model is constructed.
 alpha (float, optional (default=0.05)) – the level in the confidence intervals.
Important
The parameterization of this model changed in lifelines 0.19.1. Previously, the cumulative hazard looked like \(\lambda_i t\). The parameterization is now the reciprocal of \(\lambda_i\).

cumulative_hazard_
¶ The estimated cumulative hazard (with custom timeline if provided)
Type: DataFrame

hazard_
¶ The estimated hazard (with custom timeline if provided)
Type: DataFrame

survival_function_
¶ The estimated survival function (with custom timeline if provided)
Type: DataFrame

cumumlative_density_
¶ The estimated cumulative density function (with custom timeline if provided)
Type: DataFrame

variance_matrix_
¶ The variance matrix of the coefficients
Type: numpy array

median_survival_time_
¶ The median time to event
Type: float

lambda_i_
¶ The fitted parameter in the model, for i = 0, 1 … n1 breakpoints
Type: float

durations
¶ The durations provided
Type: array

event_observed
¶ The event_observed variable provided
Type: array

timeline
¶ The time line to use for plotting and indexing
Type: array

entry
¶ The entry array provided, or None
Type: array or None

breakpoints
¶ The provided breakpoints
Type: array

conditional_time_to_event_
¶ Return a DataFrame, with index equal to survival_function_, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

confidence_interval_
¶ The confidence interval of the cumulative hazard. This is an alias for
confidence_interval_cumulative_hazard_
.

confidence_interval_cumulative_density_
¶ The lower and upper confidence intervals for the cumulative density

confidence_interval_cumulative_hazard_
¶ The confidence interval of the cumulative hazard. This is an alias for
confidence_interval_
.

confidence_interval_hazard_
¶ The confidence interval of the hazard.

confidence_interval_survival_function_
¶ The lower and upper confidence intervals for the survival function

cumulative_density_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted cumulative density function (1survival function) at specific times.
Parameters:  times (iterable or float) – values to return the survival function at.
 label (string, optional) – Rename the series returned. Useful for plotting.

cumulative_hazard_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted cumulative hazard value at specific times.
Parameters:  times (iterable or float) – values to return the cumulative hazard at.
 label (string, optional) – Rename the series returned. Useful for plotting.

divide
(other) → pandas.core.frame.DataFrame¶ Divide the {0} of two {1} objects.
Parameters: other (same object as self)

event_table
¶

fit
(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Parameters:  durations (an array, or pd.Series) – length n, duration subject was observed for
 event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_interval_censoring
(lower_bound, upper_bound, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Fit the model to an interval censored dataset.
Parameters:  lower_bound (an array, or pd.Series) – length n, the start of the period the subject experienced the event in.
 upper_bound (an array, or pd.Series) – length n, the end of the period the subject experienced the event in. If the value is equal to the corresponding value in lower_bound, then the individual’s event was observed (not censored).
 event_observed (numpy array or pd.Series, optional) – length n, if left optional, infer from
lower_bound
andupper_cound
(if lower_bound==upper_bound then event observed, if lower_bound < upper_bound, then event censored)  timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_left_censoring
(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Fit the model to a leftcensored dataset
Parameters:  durations (an array, or pd.Series) – length n, duration subject was observed for
 event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

hazard_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted hazard at specific times.
Parameters:  times (iterable or float) – values to return the hazard at.
 label (string, optional) – Rename the series returned. Useful for plotting.

median_survival_time_
Return the unique time point, t, such that S(t) = 0.5. This is the “halflife” of the population, and a robust summary statistic for the population, if it exists.

percentile
(p: float) → float¶ Return the unique time point, t, such that S(t) = p.
Parameters: p (float) Note
For known parametric models, this should be overwritten by something more accurate.

plot
(**kwargs)¶ Produce a prettyplot of the estimate.

plot_cumulative_density
(**kwargs)¶

plot_cumulative_hazard
(**kwargs)¶

plot_hazard
(**kwargs)¶

plot_survival_function
(**kwargs)¶

predict
(times: Union[Iterable[float], float], interpolate=False) → pandas.core.series.Series¶ Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.
Parameters:  times (scalar, or array) – a scalar or an array of times to predict the value of {0} at.
 interpolate (bool, optional (default=False)) – for methods that produce a stepwise solution (KaplanMeier, NelsonAalen, etc), turning this to True will use an linear interpolation method to provide a more “smooth” answer.

print_summary
(decimals=2, style=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

subtract
(other) → pandas.core.frame.DataFrame¶ Subtract the {0} of two {1} objects.
Parameters: other (same object as self)

summary
¶ Summary statistics describing the fit.
See also
print_summary

survival_function_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted survival value at specific times.
Parameters:  times (iterable or float) – values to return the survival function at.
 label (string, optional) – Rename the series returned. Useful for plotting.
SplineFitter¶

class
lifelines.fitters.spline_fitter.
SplineFitter
(knot_locations: numpy.ndarray, *args, **kwargs)¶ Bases:
lifelines.fitters.spline_fitter.SplineFitterMixin
,lifelines.fitters.KnownModelParametricUnivariateFitter
Model the cumulative hazard using cubic splines. This offers great flexibility and smoothness of the cumulative hazard.
\[H(t) = \exp{\phi_0 + \phi_1\log{t} + \sum_{j=2}^N \phi_j v_j(\log{t})\]where \(v_j\) are our cubic basis functions at predetermined knots. See references for exact definition.
Parameters: knot_locations (list, np.array) – The locations of the cubic breakpoints. Typically, the first knot is the minimum observed death, the last knot is the maximum observed death, and the knots in between are the centiles of observed data (ex: if one additional knot, choose the 50th percentile, the median. If two additional knots, choose the 33th and 66th percentiles). References
Royston, P., & Parmar, M. K. B. (2002). Flexible parametric proportionalhazards and proportionalodds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine, 21(15), 2175–2197. doi:10.1002/sim.1203
Examples
>>> from lifelines import SplineFitter >>> from lifelines.datasets import load_waltons >>> waltons = load_waltons() >>> T, E = waltons['T'], waltons['E'] >>> knots = np.percentile(T.loc[E.astype(bool)], [0, 50, 100]) >>> sf = SplineFitter(knots) >>> sf.fit() >>> sf.plot() >>> print(sf.knots)

cumulative_hazard_
¶ The estimated cumulative hazard (with custom timeline if provided)
Type: DataFrame

hazard_
¶ The estimated hazard (with custom timeline if provided)
Type: DataFrame

survival_function_
¶ The estimated survival function (with custom timeline if provided)
Type: DataFrame

cumumlative_density_
¶ The estimated cumulative density function (with custom timeline if provided)
Type: DataFrame

variance_matrix_
¶ The variance matrix of the coefficients
Type: numpy array

median_survival_time_
¶ The median time to event
Type: float

lambda_
¶ The fitted parameter in the model
Type: float

rho_
¶ The fitted parameter in the model
Type: float

durations
¶ The durations provided
Type: array

event_observed
¶ The event_observed variable provided
Type: array

timeline
¶ The time line to use for plotting and indexing
Type: array

entry
¶ The entry array provided, or None
Type: array or None

knot_locations
¶ The locations of the cubic breakpoints.

basis
(x, knot, min_knot, max_knot)¶

conditional_time_to_event_
¶ Return a DataFrame, with index equal to survival_function_, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

confidence_interval_
¶ The confidence interval of the cumulative hazard. This is an alias for
confidence_interval_cumulative_hazard_
.

confidence_interval_cumulative_density_
¶ The lower and upper confidence intervals for the cumulative density

confidence_interval_cumulative_hazard_
¶ The confidence interval of the cumulative hazard. This is an alias for
confidence_interval_
.

confidence_interval_hazard_
¶ The confidence interval of the hazard.

confidence_interval_survival_function_
¶ The lower and upper confidence intervals for the survival function

cumulative_density_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted cumulative density function (1survival function) at specific times.
Parameters:  times (iterable or float) – values to return the survival function at.
 label (string, optional) – Rename the series returned. Useful for plotting.

cumulative_hazard_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted cumulative hazard value at specific times.
Parameters:  times (iterable or float) – values to return the cumulative hazard at.
 label (string, optional) – Rename the series returned. Useful for plotting.

divide
(other) → pandas.core.frame.DataFrame¶ Divide the {0} of two {1} objects.
Parameters: other (same object as self)

event_table
¶

fit
(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Parameters:  durations (an array, or pd.Series) – length n, duration subject was observed for
 event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_interval_censoring
(lower_bound, upper_bound, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Fit the model to an interval censored dataset.
Parameters:  lower_bound (an array, or pd.Series) – length n, the start of the period the subject experienced the event in.
 upper_bound (an array, or pd.Series) – length n, the end of the period the subject experienced the event in. If the value is equal to the corresponding value in lower_bound, then the individual’s event was observed (not censored).
 event_observed (numpy array or pd.Series, optional) – length n, if left optional, infer from
lower_bound
andupper_cound
(if lower_bound==upper_bound then event observed, if lower_bound < upper_bound, then event censored)  timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_left_censoring
(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Fit the model to a leftcensored dataset
Parameters:  durations (an array, or pd.Series) – length n, duration subject was observed for
 event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

hazard_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted hazard at specific times.
Parameters:  times (iterable or float) – values to return the hazard at.
 label (string, optional) – Rename the series returned. Useful for plotting.

median_survival_time_
Return the unique time point, t, such that S(t) = 0.5. This is the “halflife” of the population, and a robust summary statistic for the population, if it exists.

percentile
(p: float) → float¶ Return the unique time point, t, such that S(t) = p.
Parameters: p (float) Note
For known parametric models, this should be overwritten by something more accurate.

plot
(**kwargs)¶ Produce a prettyplot of the estimate.

plot_cumulative_density
(**kwargs)¶

plot_cumulative_hazard
(**kwargs)¶

plot_hazard
(**kwargs)¶

plot_survival_function
(**kwargs)¶

predict
(times: Union[Iterable[float], float], interpolate=False) → pandas.core.series.Series¶ Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.
Parameters:  times (scalar, or array) – a scalar or an array of times to predict the value of {0} at.
 interpolate (bool, optional (default=False)) – for methods that produce a stepwise solution (KaplanMeier, NelsonAalen, etc), turning this to True will use an linear interpolation method to provide a more “smooth” answer.

print_summary
(decimals=2, style=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

static
relu
(x)¶

subtract
(other) → pandas.core.frame.DataFrame¶ Subtract the {0} of two {1} objects.
Parameters: other (same object as self)

summary
¶ Summary statistics describing the fit.
See also
print_summary

survival_function_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted survival value at specific times.
Parameters:  times (iterable or float) – values to return the survival function at.
 label (string, optional) – Rename the series returned. Useful for plotting.

WeibullFitter¶

class
lifelines.fitters.weibull_fitter.
WeibullFitter
(*args, **kwargs)¶ Bases:
lifelines.fitters.KnownModelParametricUnivariateFitter
This class implements a Weibull model for univariate data. The model has parameterized form:
\[S(t) = \exp\left(\left(\frac{t}{\lambda}\right)^\rho\right), \lambda > 0, \rho > 0,\]The \(\lambda\) (scale) parameter has an applicable interpretation: it represent the time when 37% of the population has died. The \(\rho\) (shape) parameter controls if the cumulative hazard (see below) is convex or concave, representing accelerating or decelerating hazards.
The cumulative hazard rate is
\[H(t) = \left(\frac{t}{\lambda}\right)^\rho,\]and the hazard rate is:
\[h(t) = \frac{\rho}{\lambda}\left(\frac{t}{\lambda}\right)^{\rho1}\]After calling the
.fit
method, you have access to properties like:cumulative_hazard_
,survival_function_
,lambda_
andrho_
. A summary of the fit is available with the methodprint_summary()
.Parameters: alpha (float, optional (default=0.05)) – the level in the confidence intervals. Important
The parameterization of this model changed in lifelines 0.19.0. Previously, the cumulative hazard looked like \((\lambda t)^\rho\). The parameterization is now the reciprocal of \(\lambda\).
Examples
>>> from lifelines import WeibullFitter >>> from lifelines.datasets import load_waltons >>> waltons = load_waltons() >>> wbf = WeibullFitter() >>> wbf.fit(waltons['T'], waltons['E']) >>> wbf.plot() >>> print(wbf.lambda_)

cumulative_hazard_
¶ The estimated cumulative hazard (with custom timeline if provided)
Type: DataFrame

hazard_
¶ The estimated hazard (with custom timeline if provided)
Type: DataFrame

survival_function_
¶ The estimated survival function (with custom timeline if provided)
Type: DataFrame

cumumlative_density_
¶ The estimated cumulative density function (with custom timeline if provided)
Type: DataFrame

variance_matrix_
¶ The variance matrix of the coefficients
Type: numpy array

median_survival_time_
¶ The median time to event
Type: float

lambda_
¶ The fitted parameter in the model
Type: float

rho_
¶ The fitted parameter in the model
Type: float

durations
¶ The durations provided
Type: array

event_observed
¶ The event_observed variable provided
Type: array

timeline
¶ The time line to use for plotting and indexing
Type: array

entry
¶ The entry array provided, or None
Type: array or None
Notes
Looking for a 3parameter Weibull model? See notes here.

conditional_time_to_event_
¶ Return a DataFrame, with index equal to survival_function_, that estimates the median duration remaining until the death event, given survival up until time t. For example, if an individual exists until age 1, their expected life remaining given they lived to time 1 might be 9 years.

confidence_interval_
¶ The confidence interval of the cumulative hazard. This is an alias for
confidence_interval_cumulative_hazard_
.

confidence_interval_cumulative_density_
¶ The lower and upper confidence intervals for the cumulative density

confidence_interval_cumulative_hazard_
¶ The confidence interval of the cumulative hazard. This is an alias for
confidence_interval_
.

confidence_interval_hazard_
¶ The confidence interval of the hazard.

confidence_interval_survival_function_
¶ The lower and upper confidence intervals for the survival function

cumulative_density_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted cumulative density function (1survival function) at specific times.
Parameters:  times (iterable or float) – values to return the survival function at.
 label (string, optional) – Rename the series returned. Useful for plotting.

cumulative_hazard_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted cumulative hazard value at specific times.
Parameters:  times (iterable or float) – values to return the cumulative hazard at.
 label (string, optional) – Rename the series returned. Useful for plotting.

divide
(other) → pandas.core.frame.DataFrame¶ Divide the {0} of two {1} objects.
Parameters: other (same object as self)

event_table
¶

fit
(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Parameters:  durations (an array, or pd.Series) – length n, duration subject was observed for
 event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_interval_censoring
(lower_bound, upper_bound, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Fit the model to an interval censored dataset.
Parameters:  lower_bound (an array, or pd.Series) – length n, the start of the period the subject experienced the event in.
 upper_bound (an array, or pd.Series) – length n, the end of the period the subject experienced the event in. If the value is equal to the corresponding value in lower_bound, then the individual’s event was observed (not censored).
 event_observed (numpy array or pd.Series, optional) – length n, if left optional, infer from
lower_bound
andupper_cound
(if lower_bound==upper_bound then event observed, if lower_bound < upper_bound, then event censored)  timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_left_censoring
(durations, event_observed=None, timeline=None, label=None, alpha=None, ci_labels=None, show_progress=False, entry=None, weights=None, initial_point=None) → ParametricUnivariateFitter¶ Fit the model to a leftcensored dataset
Parameters:  durations (an array, or pd.Series) – length n, duration subject was observed for
 event_observed (numpy array or pd.Series, optional) – length n, True if the the death was observed, False if the event was lost (rightcensored). Defaults all True if event_observed==None
 timeline (list, optional) – return the estimate at the values in timeline (positively increasing)
 label (string, optional) – a string to name the column of the estimate.
 alpha (float, optional) – the alpha value in the confidence intervals. Overrides the initializing alpha for this call to fit only.
 ci_labels (list, optional) – add custom column names to the generated confidence intervals as a length2 list: [<lowerbound name>, <upperbound name>]. Default: <label>_lower_<alpha>
 show_progress (bool, optional) – since this is an iterative fitting algorithm, switching this to True will display some iteration details.
 entry (an array, or pd.Series, of length n) – relative time when a subject entered the study. This is useful for lefttruncated (not leftcensored) observations. If None, all members of the population entered study when they were “born”: time zero.
 weights (an array, or pd.Series, of length n) – integer weights per observation
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self with new properties like
cumulative_hazard_
,survival_function_
Return type: self

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

hazard_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted hazard at specific times.
Parameters:  times (iterable or float) – values to return the hazard at.
 label (string, optional) – Rename the series returned. Useful for plotting.

median_survival_time_
Return the unique time point, t, such that S(t) = 0.5. This is the “halflife” of the population, and a robust summary statistic for the population, if it exists.

percentile
(p) → float¶ Return the unique time point, t, such that S(t) = p.
Parameters: p (float) Note
For known parametric models, this should be overwritten by something more accurate.

plot
(**kwargs)¶ Produce a prettyplot of the estimate.

plot_cumulative_density
(**kwargs)¶

plot_cumulative_hazard
(**kwargs)¶

plot_hazard
(**kwargs)¶

plot_survival_function
(**kwargs)¶

predict
(times: Union[Iterable[float], float], interpolate=False) → pandas.core.series.Series¶ Predict the {0} at certain point in time. Uses a linear interpolation if points in time are not in the index.
Parameters:  times (scalar, or array) – a scalar or an array of times to predict the value of {0} at.
 interpolate (bool, optional (default=False)) – for methods that produce a stepwise solution (KaplanMeier, NelsonAalen, etc), turning this to True will use an linear interpolation method to provide a more “smooth” answer.

print_summary
(decimals=2, style=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

subtract
(other) → pandas.core.frame.DataFrame¶ Subtract the {0} of two {1} objects.
Parameters: other (same object as self)

summary
¶ Summary statistics describing the fit.
See also
print_summary

survival_function_at_times
(times, label=None) → pandas.core.series.Series¶ Return a Pandas series of the predicted survival value at specific times.
Parameters:  times (iterable or float) – values to return the survival function at.
 label (string, optional) – Rename the series returned. Useful for plotting.

Regression models¶
AalenAdditiveFitter¶

class
lifelines.fitters.aalen_additive_fitter.
AalenAdditiveFitter
(fit_intercept=True, alpha=0.05, coef_penalizer=0.0, smoothing_penalizer=0.0)¶ Bases:
lifelines.fitters.BaseFitter
This class fits the regression model:
\[h(tx) = b_0(t) + b_1(t) x_1 + ... + b_N(t) x_N\]that is, the hazard rate is a linear function of the covariates with timevarying coefficients. This implementation assumes nontimevarying covariates, see
TODO: name
Note
This class was rewritten in lifelines 0.17.0 to focus solely on static datasets. There is no guarantee of backwards compatibility.
Parameters:  fit_intercept (bool, optional (default: True)) – If False, do not attach an intercept (column of ones) to the covariate matrix. The intercept, \(b_0(t)\) acts as a baseline hazard.
 alpha (float, optional (default=0.05)) – the level in the confidence intervals.
 coef_penalizer (float, optional (default: 0)) – Attach a L2 penalizer to the size of the coefficients during regression. This improves stability of the estimates and controls for high correlation between covariates. For example, this shrinks the magnitude of \(c_{i,t}\).
 smoothing_penalizer (float, optional (default: 0)) – Attach a L2 penalizer to difference between adjacent (over time) coefficients. For example, this shrinks the magnitude of \(c_{i,t}  c_{i,t+1}\).

cumulative_hazards_
¶ The estimated cumulative hazard
Type: DataFrame

hazards_
¶ The estimated hazards
Type: DataFrame

confidence_intervals_
¶ The lower and upper confidence intervals for the cumulative hazard
Type: DataFrame

durations
¶ The durations provided
Type: array

event_observed
¶ The event_observed variable provided
Type: array

weights
¶ The event_observed variable provided
Type: array

fit
(df, duration_col, event_col=None, weights_col=None, show_progress=False)¶ Parameters: Fit the Aalen Additive model to a dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 weights_col (string, optional) – an optional column in the DataFrame, df, that denotes the weight per subject. This column is expelled and not used as a covariate, but as a weight in the final regression. Default weight is 1. This can be used for caseweights. For example, a weight of 2 means there were two subjects with identical observations. This can be used for sampling weights.
 show_progress (bool, optional (default=False)) – Since the fitter is iterative, show iteration number.
Returns: self – self with additional new properties:
cumulative_hazards_
, etc.Return type: Examples
>>> from lifelines import AalenAdditiveFitter >>> >>> df = pd.DataFrame({ >>> 'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], >>> 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], >>> 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> }) >>> >>> aaf = AalenAdditiveFitter() >>> aaf.fit(df, 'T', 'E') >>> aaf.predict_median(df) >>> aaf.print_summary()

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

plot
(columns=None, loc=None, iloc=None, ax=None, **kwargs)¶ ” A wrapper around plotting. Matplotlib plot arguments can be passed in, plus:
Parameters: columns (string or listlike, optional) – If not empty, plot a subset of columns from the
cumulative_hazards_
. Default all.loc
iloc (slice, optional) –
 specify a locationbased subsection of the curves to plot, ex:
.plot(iloc=slice(0,10))
will plot the first 10 time points.

predict_cumulative_hazard
(X)¶ Returns the hazard rates for the individuals
Parameters: X (a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns) – can be in any order. If a numpy array, columns must be in the same order as the training data.

predict_expectation
(X)¶ Compute the expected lifetime, E[T], using covariates X.
Parameters:  X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 Returns the expected lifetimes for the individuals

predict_median
(X)¶ Parameters:  X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 Returns the median lifetimes for the individuals

predict_percentile
(X, p=0.5)¶ Returns the median lifetimes for the individuals. http://stats.stackexchange.com/questions/102986/percentilelossfunctions
Parameters:  X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 p (float) – default: 0.5

predict_survival_function
(X, times=None)¶ Returns the survival functions for the individuals
Parameters:  X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times – Not implemented yet

print_summary
(decimals=2, style=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 kwargs – print additional meta data in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

score_
¶ The concordance score (also known as the cindex) of the fit. The cindex is a generalization of the ROC AUC to survival data, including censorships.
For this purpose, the
score_
is a measure of the predictive accuracy of the fitted model onto the training dataset. It’s analogous to the R^2 in linear models.

smoothed_hazards_
(bandwidth=1)¶ Using the epanechnikov kernel to smooth the hazard function, with sigma/bandwidth

summary
¶ Summary statistics describing the fit.
Returns: df Return type: DataFrame
CoxTimeVaryingFitter¶

class
lifelines.fitters.cox_time_varying_fitter.
CoxTimeVaryingFitter
(alpha=0.05, penalizer=0.0, strata=None)¶ Bases:
lifelines.fitters.BaseFitter
This class implements fitting Cox’s timevarying proportional hazard model:
\[h(tx(t)) = h_0(t)\exp(x(t)'\beta)\]Parameters:  alpha (float, optional (default=0.05)) – the level in the confidence intervals.
 penalizer (float, optional) – the coefficient of an L2 penalizer in the regression

params_
¶ The estimated coefficients. Changed in version 0.22.0: use to be
.hazards_
Type: Series

hazard_ratios_
¶ The exp(coefficients)
Type: Series

confidence_intervals_
¶ The lower and upper confidence intervals for the hazard coefficients
Type: DataFrame

event_observed
¶ The event_observed variable provided
Type: Series

weights
¶ The event_observed variable provided
Type: Series

variance_matrix_
¶ The variance matrix of the coefficients
Type: numpy array

strata
¶ the strata provided
Type: list

standard_errors_
¶ the standard errors of the estimates
Type: Series

baseline_cumulative_hazard_
¶ Type: DataFrame

baseline_survival_
¶ Type: DataFrame

fit
(df, id_col, event_col, start_col='start', stop_col='stop', weights_col=None, show_progress=False, step_size=None, robust=False, strata=None, initial_point=None)¶ Fit the Cox Proportional Hazard model to a time varying dataset. Tied survival times are handled using Efron’s tiemethod.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col, plus other covariates. duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 id_col (string) – A subject could have multiple rows in the DataFrame. This column contains the unique identifier per subject.
 event_col (string) – the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are noncensored.
 start_col (string) – the column that contains the start of a subject’s time period.
 stop_col (string) – the column that contains the end of a subject’s time period.
 weights_col (string, optional) – the column that contains (possibly timevarying) weight of each subjectperiod row.
 show_progress (since the fitter is iterative, show convergence) – diagnostics.
 robust (bool, optional (default: True)) – Compute the robust errors using the Huber sandwich estimator, aka WeiLin estimate. This does not handle ties, so if there are high number of ties, results may significantly differ. See “The Robust Inference for the Cox Proportional Hazards Model”, Journal of the American Statistical Association, Vol. 84, No. 408 (Dec., 1989), pp. 1074 1078
 step_size (float, optional) – set an initial step size for the fitting algorithm.
 strata (list or string, optional) – specify a column or list of columns n to use in stratification. This is useful if a categorical covariate does not obey the proportional hazard assumption. This is used similar to the strata expression in R. See http://courses.washington.edu/b515/l17.pdf.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
Returns: self – self, with additional properties like
hazards_
andprint_summary
Return type:

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

log_likelihood_ratio_test
()¶ This function computes the likelihood ratio test for the Cox model. We compare the existing model (with all the covariates) to the trivial model of no covariates.
Conveniently, we can actually use CoxPHFitter class to do most of the work.

plot
(columns=None, ax=None, **errorbar_kwargs)¶ Produces a visual representation of the coefficients, including their standard errors and magnitudes.
Parameters:  columns (list, optional) – specify a subset of the columns to plot
 errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis

predict_log_partial_hazard
(X)¶ This is equivalent to R’s linear.predictors. Returns the log of the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to \((x  \bar{x})'\beta\)
Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. Returns: Return type: DataFrame Note
If X is a DataFrame, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

predict_partial_hazard
(X)¶ Returns the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to \(\exp{(x  \bar{x})'\beta }\)
Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. Returns: Return type: DataFrame Note
If X is a DataFrame, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

print_summary
(decimals=2, style=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 kwargs – print additional meta data in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

summary
¶ Summary statistics describing the fit. Set alpha property in the object before calling.
Returns: df – Contains columns coef, np.exp(coef), se(coef), z, p, lower, upper Return type: DataFrame
CoxPHFitter¶

class
lifelines.fitters.coxph_fitter.
CoxPHFitter
(tie_method: str = 'Efron', penalizer: float = 0.0, strata: Union[List[str], str, None] = None, **kwargs)¶ Bases:
lifelines.fitters.RegressionFitter
This class implements fitting Cox’s proportional hazard model:
\[h(tx) = h_0(t) \exp((x  \overline{x})' \beta)\]Parameters:  alpha (float, optional (default=0.05)) – the level in the confidence intervals.
 tie_method (string, optional) – specify how the fitter should deal with ties. Currently only ‘Efron’ is available.
 penalizer (float, optional (default=0.0)) – Attach an L2 penalizer to the size of the coefficients during regression. This improves stability of the estimates and controls for high correlation between covariates. For example, this shrinks the magnitude value of \(\beta_i\). The penalty is \(\frac{1}{2} \text{penalizer} \beta^2\).
 strata (list, optional) – specify a list of columns to use in stratification. This is useful if a categorical covariate does not obey the proportional hazard assumption. This is used similar to the strata expression in R. See http://courses.washington.edu/b515/l17.pdf.
Examples
>>> from lifelines.datasets import load_rossi >>> from lifelines import CoxPHFitter >>> rossi = load_rossi() >>> cph = CoxPHFitter() >>> cph.fit(rossi, 'week', 'arrest') >>> cph.print_summary()

params_
¶ The estimated coefficients. Changed in version 0.22.0: use to be
.hazards_
Type: Series

hazard_ratios_
¶ The exp(coefficients)
Type: Series

confidence_intervals_
¶ The lower and upper confidence intervals for the hazard coefficients
Type: DataFrame

durations
¶ The durations provided
Type: Series

event_observed
¶ The event_observed variable provided
Type: Series

weights
¶ The event_observed variable provided
Type: Series

variance_matrix_
¶ The variance matrix of the coefficients
Type: numpy array

strata
¶ the strata provided
Type: list

standard_errors_
¶ the standard errors of the estimates
Type: Series

score_
¶ the concordance index of the model.
Type: float

baseline_hazard_
¶ Type: DataFrame

baseline_cumulative_hazard_
¶ Type: DataFrame

baseline_survival_
¶ Type: DataFrame

check_assumptions
(training_df: pandas.core.frame.DataFrame, advice: bool = True, show_plots: bool = False, p_value_threshold: float = 0.01, plot_n_bootstraps: int = 10, columns: Optional[List[str]] = None) → None¶ Use this function to test the proportional hazards assumption. See usage example at https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html
Parameters:  training_df (DataFrame) – the original DataFrame used in the call to
fit(...)
or a subsampled version.  advice (bool, optional) – display advice as output to the user’s screen
 show_plots (bool, optional) – display plots of the scaled schoenfeld residuals and loess curves. This is an eyeball test for violations. This will slow down the function significantly.
 p_value_threshold (float, optional) – the threshold to use to alert the user of violations. See note below.
 plot_n_bootstraps – in the plots displayed, also display plot_n_bootstraps bootstrapped loess curves. This will slow down the function significantly.
 columns (list, optional) – specify a subset of columns to test.
Examples
>>> from lifelines.datasets import load_rossi >>> from lifelines import CoxPHFitter >>> >>> rossi = load_rossi() >>> cph = CoxPHFitter().fit(rossi, 'week', 'arrest') >>> >>> cph.check_assumptions(rossi)
Notes
The
p_value_threshold
is arbitrarily set at 0.01. Under the null, some covariates will be below the threshold (i.e. by chance). This is compounded when there are many covariates.Similarly, when there are lots of observations, even minor deviances from the proportional hazard assumption will be flagged.
With that in mind, it’s best to use a combination of statistical tests and eyeball tests to determine the most serious violations.
References
section 5 in https://socialsciences.mcmaster.ca/jfox/Books/Companion/appendices/AppendixCoxRegression.pdf, http://www.mwsug.org/proceedings/2006/stats/MWSUG2006SD08.pdf, http://eprints.lse.ac.uk/84988/1/06_ParkHendry2015ReassessingSchoenfeldTests_Final.pdf
 training_df (DataFrame) – the original DataFrame used in the call to

compute_residuals
(training_dataframe: pandas.core.frame.DataFrame, kind: str) → pandas.core.frame.DataFrame¶ Compute the residuals the model.
Parameters:  training_dataframe (DataFrame) – the same training DataFrame given in fit
 kind (string) – {‘schoenfeld’, ‘score’, ‘delta_beta’, ‘deviance’, ‘martingale’, ‘scaled_schoenfeld’}

fit
(df: pandas.core.frame.DataFrame, duration_col: Optional[str] = None, event_col: Optional[str] = None, show_progress: bool = False, initial_point: Optional[numpy.ndarray] = None, strata: Union[List[str], str, None] = None, step_size: Optional[float] = None, weights_col: Optional[str] = None, cluster_col: Optional[str] = None, robust: bool = False, batch_mode: Optional[bool] = None) → CoxPHFitter¶ Fit the Cox proportional hazard model to a dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights, strata). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 weights_col (string, optional) – an optional column in the DataFrame, df, that denotes the weight per subject. This column is expelled and not used as a covariate, but as a weight in the final regression. Default weight is 1. This can be used for caseweights. For example, a weight of 2 means there were two subjects with identical observations. This can be used for sampling weights. In that case, use robust=True to get more accurate standard errors.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 strata (list or string, optional) – specify a column or list of columns n to use in stratification. This is useful if a categorical covariate does not obey the proportional hazard assumption. This is used similar to the strata expression in R. See http://courses.washington.edu/b515/l17.pdf.
 step_size (float, optional) – set an initial step size for the fitting algorithm. Setting to 1.0 may improve performance, but could also hurt convergence.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator, aka WeiLin estimate. This does not handle ties, so if there are high number of ties, results may significantly differ. See “The Robust Inference for the Cox Proportional Hazards Model”, Journal of the American Statistical Association, Vol. 84, No. 408 (Dec., 1989), pp. 1074 1078
 cluster_col (string, optional) – specifies what column has unique identifiers for clustering covariances. Using this forces the sandwich estimator (robust variance estimator) to be used.
 batch_mode (bool, optional) – enabling batch_mode can be faster for datasets with a large number of ties. If left as None, lifelines will choose the best option.
Returns: self – self with additional new properties:
print_summary
,hazards_
,confidence_intervals_
,baseline_survival_
, etc.Return type: Note
Tied survival times are handled using Efron’s tiemethod.
Examples
>>> from lifelines import CoxPHFitter >>> >>> df = pd.DataFrame({ >>> 'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], >>> 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], >>> 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> }) >>> >>> cph = CoxPHFitter() >>> cph.fit(df, 'T', 'E') >>> cph.print_summary() >>> cph.predict_median(df)
>>> from lifelines import CoxPHFitter >>> >>> df = pd.DataFrame({ >>> 'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], >>> 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], >>> 'weights': [1.1, 0.5, 2.0, 1.6, 1.2, 4.3, 1.4, 4.5, 3.0, 3.2, 0.4, 6.2], >>> 'month': [10, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> }) >>> >>> cph = CoxPHFitter() >>> cph.fit(df, 'T', 'E', strata=['month', 'age'], robust=True, weights_col='weights') >>> cph.print_summary() >>> cph.predict_median(df)

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

log_likelihood_ratio_test
() → lifelines.statistics.StatisticalResult¶ This function computes the likelihood ratio test for the Cox model. We compare the existing model (with all the covariates) to the trivial model of no covariates.

plot
(columns=None, hazard_ratios=False, ax=None, **errorbar_kwargs)¶ Produces a visual representation of the coefficients (i.e. log hazard ratios), including their standard errors and magnitudes.
Parameters:  columns (list, optional) – specify a subset of the columns to plot
 hazard_ratios (bool, optional) – by default, plot will present the loghazard ratios (the coefficients). However, by turning this flag to True, the hazard ratios are presented instead.
 errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command
Examples
>>> from lifelines import datasets, CoxPHFitter >>> rossi = datasets.load_rossi() >>> cph = CoxPHFitter().fit(rossi, 'week', 'arrest') >>> cph.plot(hazard_ratios=True)
Returns: ax – the matplotlib axis that be edited. Return type: matplotlib axis

plot_covariate_groups
(covariates, values, plot_baseline=True, **kwargs)¶ Produces a plot comparing the baseline survival curve of the model versus what happens when a covariate(s) is varied over values in a group. This is useful to compare subjects’ survival as we vary covariate(s), all else being held equal. The baseline survival curve is equal to the predicted survival curve at all average values in the original dataset.
Parameters:  covariates (string or list) – a string (or list of strings) of the covariate(s) in the original dataset that we wish to vary.
 values (1d or 2d iterable) – an iterable of the specific values we wish the covariate(s) to take on.
 plot_baseline (bool) – also display the baseline survival, defined as the survival at the mean of the original dataset.
 kwargs – pass in additional plotting commands.
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis, or list of axis’
Examples
>>> from lifelines import datasets, CoxPHFitter >>> rossi = datasets.load_rossi() >>> cph = CoxPHFitter().fit(rossi, 'week', 'arrest') >>> cph.plot_covariate_groups('prio', values=arange(0, 15, 3), cmap='coolwarm')
>>> # multiple variables at once >>> cph.plot_covariate_groups(['prio', 'paro'], values=[ >>> [0, 0], >>> [5, 0], >>> [10, 0], >>> [0, 1], >>> [5, 1], >>> [10, 1] >>> ], cmap='coolwarm')
>>> # if you have categorical variables, you can do the following to see the >>> # effect of all the categories on one plot. >>> cph.plot_covariate_groups(['dummy1', 'dummy2', 'dummy3'], values=[[1, 0, 0], [0, 1, 0], [0, 0, 1]]) >>> # same as: >>> cph.plot_covariate_groups(['dummy1', 'dummy2', 'dummy3'], values=np.eye(3))

predict_cumulative_hazard
(X: Union[pandas.core.series.Series, pandas.core.frame.DataFrame], times: Union[numpy.ndarray, List[float], None] = None, conditional_after: Optional[List[int]] = None) → pandas.core.frame.DataFrame¶ Parameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index.
 conditional_after (iterable, optional) – Must be equal is size to X.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. reset back to starting at 0.

predict_expectation
(X: pandas.core.frame.DataFrame, conditional_after: Optional[numpy.ndarray] = None) → pandas.core.frame.DataFrame¶ Compute the expected lifetime, \(E[T]\), using covariates X. This algorithm to compute the expectation is to use the fact that \(E[T] = \int_0^\inf P(T > t) dt = \int_0^\inf S(t) dt\). To compute the integral, we use the trapizoidal rule to approximate the integral.
Caution
If the survival function doesn’t converge to 0, then the expectation is really infinity and the returned values are meaningless/too large. In that case, using
predict_median
orpredict_percentile
would be better.Parameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 conditional_after (iterable, optional) – Must be equal is size to X.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
Notes
If X is a DataFrame, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.
See also

predict_log_partial_hazard
(X: Union[numpy.ndarray, pandas.core.frame.DataFrame]) → pandas.core.frame.DataFrame¶ This is equivalent to R’s linear.predictors. Returns the log of the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to \((x  \text{mean}(x_{\text{train}})) \beta\)
Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. Notes
If X is a DataFrame, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

predict_median
(X: pandas.core.frame.DataFrame, conditional_after: Optional[numpy.ndarray] = None) → pandas.core.frame.DataFrame¶ Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Parameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 conditional_after (iterable, optional) – Must be equal is size to X.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
See also

predict_partial_hazard
(X: Union[numpy.ndarray, pandas.core.frame.DataFrame]) → pandas.core.frame.DataFrame¶ Returns the partial hazard for the individuals, partial since the baseline hazard is not included. Equal to \(\exp{(x  mean(x_{train}))'\beta}\)
Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. Notes
If X is a DataFrame, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

predict_percentile
(X: pandas.core.frame.DataFrame, p: float = 0.5, conditional_after: Optional[numpy.ndarray] = None) → pandas.core.frame.DataFrame¶ Returns the median lifetimes for the individuals, by default. If the survival curve of an individual does not cross 0.5, then the result is infinity. http://stats.stackexchange.com/questions/102986/percentilelossfunctions
Parameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 p (float, optional (default=0.5)) – the percentile, must be between 0 and 1.
 conditional_after (iterable, optional) – Must be equal is size to X.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
See also

predict_survival_function
(X: Union[pandas.core.series.Series, pandas.core.frame.DataFrame], times: Union[numpy.ndarray, List[float], None] = None, conditional_after: Optional[List[int]] = None) → pandas.core.frame.DataFrame¶ Predict the survival function for individuals, given their covariates. This assumes that the individual just entered the study (that is, we do not condition on how long they have already lived for.)
Parameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index.
 conditional_after (iterable, optional) – Must be equal is size to X.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.

print_summary
(decimals: int = 2, style: Optional[str] = None, **kwargs) → None¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

score_
The concordance score (also known as the cindex) of the fit. The cindex is a generalization of the ROC AUC to survival data, including censorships.
For this purpose, the
score_
is a measure of the predictive accuracy of the fitted model onto the training dataset.References

summary
¶ Summary statistics describing the fit. Set alpha property in the object before calling.
Returns: df – Contains columns coef, exp(coef), se(coef), z, p, lower, upper Return type: DataFrame
WeibullAFTFitter¶

class
lifelines.fitters.weibull_aft_fitter.
WeibullAFTFitter
(alpha: float = 0.05, penalizer: float = 0.0, l1_ratio: float = 0.0, fit_intercept: bool = True, model_ancillary: bool = False)¶ Bases:
lifelines.fitters.ParametericAFTRegressionFitter
This class implements a Weibull AFT model. The model has parameterized form, with \(\lambda(x) = \exp\left(\beta_0 + \beta_1x_1 + ... + \beta_n x_n \right)\), and optionally, \(\rho(y) = \exp\left(\alpha_0 + \alpha_1 y_1 + ... + \alpha_m y_m \right)\),
\[S(t; x, y) = \exp\left(\left(\frac{t}{\lambda(x)}\right)^{\rho(y)}\right),\]With no covariates, the Weibull model’s parameters has the following interpretations: The \(\lambda\) (scale) parameter has an applicable interpretation: it represent the time when 37% of the population has died. The \(\rho\) (shape) parameter controls if the cumulative hazard (see below) is convex or concave, representing accelerating or decelerating hazards.
The cumulative hazard rate is
\[H(t; x, y) = \left(\frac{t}{\lambda(x)} \right)^{\rho(y)},\]After calling the
.fit
method, you have access to properties like:params_
,print_summary()
. A summary of the fit is available with the methodprint_summary()
.Parameters:  alpha (float, optional (default=0.05)) – the level in the confidence intervals.
 fit_intercept (boolean, optional (default=True)) – Allow lifelines to add an intercept column of 1s to df, and ancillary_df if applicable.
 penalizer (float, optional (default=0.0)) – the penalizer coefficient to the size of the coefficients. See l1_ratio. Must be equal to or greater than 0.
 l1_ratio (float, optional (default=0.0)) – how much of the penalizer should be attributed to an l1 penalty (otherwise an l2 penalty). The penalty function looks like
penalizer * l1_ratio * w_1 + 0.5 * penalizer * (1  l1_ratio) * w^2_2
 model_ancillary (optional (default=False)) – set the model instance to always model the ancillary parameter with the supplied Dataframe. This is useful for gridsearch optimization.

params_
¶ The estimated coefficients
Type: DataFrame

confidence_intervals_
¶ The lower and upper confidence intervals for the coefficients
Type: DataFrame

durations
¶ The event_observed variable provided
Type: Series

event_observed
¶ The event_observed variable provided
Type: Series

weights
¶ The event_observed variable provided
Type: Series

variance_matrix_
¶ The variance matrix of the coefficients
Type: numpy array

standard_errors_
¶ the standard errors of the estimates
Type: Series

score_
¶ the concordance index of the model.
Type: float

check_assumptions
(training_df: pandas.core.frame.DataFrame, advice: bool = True, show_plots: bool = False, p_value_threshold: float = 0.01, plot_n_bootstraps: int = 10, columns: Optional[List[str]] = None) → None¶ Use this function to test the proportional hazards assumption. See usage example at https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html
Parameters:  training_df (DataFrame) – the original DataFrame used in the call to
fit(...)
or a subsampled version.  advice (boolean, optional) – display advice as output to the user’s screen
 show_plots (boolean, optional) – display plots of the scaled schoenfeld residuals and loess curves. This is an eyeball test for violations. This will slow down the function significantly.
 p_value_threshold (float, optional) – the threshold to use to alert the user of violations. See note below.
 plot_n_bootstraps – in the plots displayed, also display plot_n_bootstraps bootstrapped loess curves. This will slow down the function significantly.
 columns (list, optional) – specify a subset of columns to test.
Examples
>>> from lifelines.datasets import load_rossi >>> from lifelines import CoxPHFitter >>> >>> rossi = load_rossi() >>> cph = CoxPHFitter().fit(rossi, 'week', 'arrest') >>> >>> cph.check_assumptions(rossi)
Notes
The
p_value_threshold
is arbitrarily set at 0.01. Under the null, some covariates will be below the threshold by chance. This is compounded when there are many covariates.Similarly, when there are lots of observations, even minor deviances from the proportional hazard assumption will be flagged.
With that in mind, it’s best to use a combination of statistical tests and eyeball tests to determine the most serious violations.
References
section 5 in https://socialsciences.mcmaster.ca/jfox/Books/Companion/appendices/AppendixCoxRegression.pdf, http://www.mwsug.org/proceedings/2006/stats/MWSUG2006SD08.pdf, http://eprints.lse.ac.uk/84988/1/06_ParkHendry2015ReassessingSchoenfeldTests_Final.pdf
 training_df (DataFrame) – the original DataFrame used in the call to

compute_residuals
(training_dataframe: pandas.core.frame.DataFrame, kind: str) → pandas.core.frame.DataFrame¶ TODO: move me to more general class.
Parameters:  training_dataframe (DataFrame) – the same training DataFrame given in fit
 kind (string) – {‘schoenfeld’, ‘scaled_schoenfeld’}

fit
(df, duration_col, event_col=None, ancillary_df=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → ParametericAFTRegressionFitter¶ Fit the accelerated failure time model to a rightcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 ancillary_df (None, boolean, or DataFrame, optional (default=None)) – Choose to model the ancillary parameters.
If None or False, explicitly do not fit the ancillary parameters using any covariates.
If True, model the ancillary parameters with the same covariates as
df
. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count asdf
.  fit_intercept (bool, optional) – If true, add a constant column to the regression. Overrides value set in class instantiation.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (string) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: Return type: self with additional new properties
print_summary
,params_
,confidence_intervals_
and moreExamples
>>> from lifelines import WeibullAFTFitter, LogNormalAFTFitter, LogLogisticAFTFitter >>> >>> df = pd.DataFrame({ >>> 'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], >>> 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], >>> 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> }) >>> >>> aft = WeibullAFTFitter() >>> aft.fit(df, 'T', 'E') >>> aft.print_summary() >>> aft.predict_median(df) >>> >>> aft = WeibullAFTFitter() >>> aft.fit(df, 'T', 'E', ancillary_df=df) >>> aft.print_summary() >>> aft.predict_median(df)

fit_interval_censoring
(df, lower_bound_col, upper_bound_col, event_col=None, ancillary_df=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → ParametericAFTRegressionFitter¶ Fit the accelerated failure time model to a intervalcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns
lower_bound_col
,upper_bound_col
(see below), and any other covariates or weights.  lower_bound_col (string) – the name of the column in DataFrame that contains the subjects’ leftmost observation.
 upper_bound_col (string) – the name of the column in DataFrame that contains the subjects’ rightmost observation. Values can be np.inf (and should be if the subject is rightcensored).
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, will be inferred from the start and stop columns (lower_bound==upper_bound means uncensored)
 ancillary_df (None, boolean, or DataFrame, optional (default=None)) – Choose to model the ancillary parameters.
If None or False, explicitly do not fit the ancillary parameters using any covariates.
If True, model the ancillary parameters with the same covariates as
df
. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count asdf
.  fit_intercept (bool, optional) – If true, add a constant column to the regression. Overrides value set in class instantiation.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (str) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: Return type: self with additional new properties
print_summary
,params_
,confidence_intervals_
and moreExamples
>>> from lifelines import WeibullAFTFitter, LogNormalAFTFitter, LogLogisticAFTFitter >>> >>> df = pd.DataFrame({ >>> 'start': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'stop': [5, 3, 9, 8, 7, 4, 8, 5, 2, 5, 6, np.inf], # this last subject is rightcensored. >>> 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], >>> 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], >>> 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> }) >>> >>> aft = WeibullAFTFitter() >>> aft.fit_interval_censoring(df, 'start', 'stop', 'E') >>> aft.print_summary() >>> aft.predict_median(df) >>> >>> aft = WeibullAFTFitter() >>> aft.fit_interval_censoring(df, 'start', 'stop', 'E', ancillary_df=df) >>> aft.print_summary() >>> aft.predict_median(df)
 df (DataFrame) – a Pandas DataFrame with necessary columns

fit_left_censoring
(df, duration_col=None, event_col=None, ancillary_df=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → ParametericAFTRegressionFitter¶ Fit the accelerated failure time model to a leftcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes/measurements/etc. This column contains the (possibly) leftcensored data.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 ancillary_df (None, boolean, or DataFrame, optional (default=None)) – Choose to model the ancillary parameters.
If None or False, explicitly do not fit the ancillary parameters using any covariates.
If True, model the ancillary parameters with the same covariates as
df
. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count asdf
.  fit_intercept (bool, optional) – If true, add a constant column to the regression. Overrides value set in class instantiation.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (str) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: self
Return type: self with additional new properties
print_summary
,params_
,confidence_intervals_
and moreExamples
>>> from lifelines import WeibullAFTFitter, LogNormalAFTFitter, LogLogisticAFTFitter >>> >>> df = pd.DataFrame({ >>> 'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], >>> 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], >>> 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> }) >>> >>> aft = WeibullAFTFitter() >>> aft.fit_left_censoring(df, 'T', 'E') >>> aft.print_summary() >>> aft.predict_median(df) >>> >>> aft = WeibullAFTFitter() >>> aft.fit_left_censoring(df, 'T', 'E', ancillary_df=df) >>> aft.print_summary() >>> aft.predict_median(df)

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

log_likelihood_ratio_test
()¶ This function computes the likelihood ratio test for the model. We compare the existing model (with all the covariates) to the trivial model of no covariates.

mean_survival_time_
¶ The mean survival time of the average subject in the training dataset.

median_survival_time_
¶ The median survival time of the average subject in the training dataset.

plot
(columns=None, parameter=None, ax=None, **errorbar_kwargs)¶ Produces a visual representation of the coefficients, including their standard errors and magnitudes.
Parameters:  columns (list, optional) – specify a subset of the columns to plot
 errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis

plot_covariate_groups
(covariates, values, plot_baseline=True, ax=None, **kwargs)¶ Produces a visual representation comparing the baseline survival curve of the model versus what happens when a covariate(s) is varied over values in a group. This is useful to compare subjects’ survival as we vary covariate(s), all else being held equal. The baseline survival curve is equal to the predicted survival curve at all average values in the original dataset.
Parameters:  covariates (string or list) – a string (or list of strings) of the covariate in the original dataset that we wish to vary.
 values (1d or 2d iterable) – an iterable of the values we wish the covariate to take on.
 plot_baseline (bool) – also display the baseline survival, defined as the survival at the mean of the original dataset.
 kwargs – pass in additional plotting commands
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis, or list of axis’
Examples
>>> from lifelines import datasets, WeibullAFTFitter >>> rossi = datasets.load_rossi() >>> wf = WeibullAFTFitter().fit(rossi, 'week', 'arrest') >>> wf.plot_covariate_groups('prio', values=np.arange(0, 15), cmap='coolwarm')
>>> # multiple variables at once >>> wf.plot_covariate_groups(['prio', 'paro'], values=[[0, 0], [5, 0], [10, 0], [0, 1], [5, 1], [10, 1]], cmap='coolwarm')
>>> # if you have categorical variables, you can simply things: >>> wf.plot_covariate_groups(['dummy1', 'dummy2', 'dummy3'], values=np.eye(3))

predict_cumulative_hazard
(df, *, ancillary_df=None, times=None, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Parameters:  df (DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved).
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.

predict_expectation
(df: pandas.core.frame.DataFrame, ancillary_df: Optional[pandas.core.frame.DataFrame] = None) → pandas.core.frame.DataFrame¶ Predict the expectation of lifetimes, \(E[T  x]\).
Parameters:  df (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 ancillary_df (DataFrame, optional) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
Returns: percentiles – the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Return type: DataFrame
See also

predict_hazard
(df, *, ancillary_df=None, times=None, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Parameters:  df (DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved).
 conditional_after (iterable, optional) – Not implemented yet

predict_median
(df, *, ancillary_df=None, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Parameters:  df (DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
See also

predict_percentile
(df: pandas.core.frame.DataFrame, *, ancillary_df: Optional[pandas.core.frame.DataFrame] = None, p: float = 0.5, conditional_after: Optional[numpy.ndarray] = None) → pandas.core.frame.DataFrame¶ Returns the median lifetimes for the individuals, by default. If the survival curve of an individual does not cross 0.5, then the result is infinity. http://stats.stackexchange.com/questions/102986/percentilelossfunctions
Parameters:  df (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 ancillary_df (DataFrame, optional) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 p (float, optional (default=0.5)) – the percentile, must be between 0 and 1.
Returns: percentiles
Return type: DataFrame
See also

predict_survival_function
(df, times=None, conditional_after=None, ancillary_df=None) → pandas.core.frame.DataFrame¶ Predict the survival function for individuals, given their covariates. This assumes that the individual just entered the study (that is, we do not condition on how long they have already lived for.)
Parameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times (iterable, optional) – an iterable of increasing times to predict the survival function at. Default is the set of all durations (observed and unobserved).
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.

print_summary
(decimals=2, style=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

score_
The concordance score (also known as the cindex) of the fit. The cindex is a generalization of the ROC AUC to survival data, including censorships.
For this purpose, the
score_
is a measure of the predictive accuracy of the fitted model onto the training dataset.

summary
¶ Summary statistics describing the fit.
See also
print_summary
LogNormalAFTFitter¶

class
lifelines.fitters.log_normal_aft_fitter.
LogNormalAFTFitter
(alpha=0.05, penalizer=0.0, l1_ratio=0.0, fit_intercept=True, model_ancillary=False)¶ Bases:
lifelines.fitters.ParametericAFTRegressionFitter
This class implements a LogNormal AFT model. The model has parameterized form, with \(\mu(x) = a_0 + a_1x_1 + ... + a_n x_n\), and optionally, \(\sigma(y) = \exp\left(b_0 + b_1 y_1 + ... + b_m y_m \right)\),
The cumulative hazard rate is
\[H(t; x, y) = \log\left(1  \Phi\left(\frac{\log(T)  \mu(x)}{\sigma(y)}\right)\right)\]After calling the
.fit
method, you have access to properties like:params_
,print_summary()
. A summary of the fit is available with the methodprint_summary()
.Parameters:  alpha (float, optional (default=0.05)) – the level in the confidence intervals.
 fit_intercept (bool, optional (default=True)) – Allow lifelines to add an intercept column of 1s to df, and ancillary_df if applicable.
 penalizer (float, optional (default=0.0)) – the penalizer coefficient to the size of the coefficients. See l1_ratio. Must be equal to or greater than 0.
 l1_ratio (float, optional (default=0.0)) – how much of the penalizer should be attributed to an l1 penalty (otherwise an l2 penalty). The penalty function looks like
penalizer * l1_ratio * w_1 + 0.5 * penalizer * (1  l1_ratio) * w^2_2
 model_ancillary (optional (default=False)) – set the model instance to always model the ancillary parameter with the supplied DataFrame. This is useful for gridsearch optimization.

params_
¶ The estimated coefficients
Type: DataFrame

confidence_intervals_
¶ The lower and upper confidence intervals for the coefficients
Type: DataFrame

durations
¶ The event_observed variable provided
Type: Series

event_observed
¶ The event_observed variable provided
Type: Series

weights
¶ The event_observed variable provided
Type: Series

variance_matrix_
¶ The variance matrix of the coefficients
Type: numpy array

standard_errors_
¶ the standard errors of the estimates
Type: Series

score_
¶ the concordance index of the model.
Type: float

compute_residuals
(df)¶

fit
(df, duration_col, event_col=None, ancillary_df=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → ParametericAFTRegressionFitter¶ Fit the accelerated failure time model to a rightcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 ancillary_df (None, boolean, or DataFrame, optional (default=None)) – Choose to model the ancillary parameters.
If None or False, explicitly do not fit the ancillary parameters using any covariates.
If True, model the ancillary parameters with the same covariates as
df
. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count asdf
.  fit_intercept (bool, optional) – If true, add a constant column to the regression. Overrides value set in class instantiation.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (string) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: Return type: self with additional new properties
print_summary
,params_
,confidence_intervals_
and moreExamples
>>> from lifelines import WeibullAFTFitter, LogNormalAFTFitter, LogLogisticAFTFitter >>> >>> df = pd.DataFrame({ >>> 'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], >>> 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], >>> 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> }) >>> >>> aft = WeibullAFTFitter() >>> aft.fit(df, 'T', 'E') >>> aft.print_summary() >>> aft.predict_median(df) >>> >>> aft = WeibullAFTFitter() >>> aft.fit(df, 'T', 'E', ancillary_df=df) >>> aft.print_summary() >>> aft.predict_median(df)

fit_interval_censoring
(df, lower_bound_col, upper_bound_col, event_col=None, ancillary_df=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → ParametericAFTRegressionFitter¶ Fit the accelerated failure time model to a intervalcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns
lower_bound_col
,upper_bound_col
(see below), and any other covariates or weights.  lower_bound_col (string) – the name of the column in DataFrame that contains the subjects’ leftmost observation.
 upper_bound_col (string) – the name of the column in DataFrame that contains the subjects’ rightmost observation. Values can be np.inf (and should be if the subject is rightcensored).
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, will be inferred from the start and stop columns (lower_bound==upper_bound means uncensored)
 ancillary_df (None, boolean, or DataFrame, optional (default=None)) – Choose to model the ancillary parameters.
If None or False, explicitly do not fit the ancillary parameters using any covariates.
If True, model the ancillary parameters with the same covariates as
df
. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count asdf
.  fit_intercept (bool, optional) – If true, add a constant column to the regression. Overrides value set in class instantiation.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (str) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: Return type: self with additional new properties
print_summary
,params_
,confidence_intervals_
and moreExamples
>>> from lifelines import WeibullAFTFitter, LogNormalAFTFitter, LogLogisticAFTFitter >>> >>> df = pd.DataFrame({ >>> 'start': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'stop': [5, 3, 9, 8, 7, 4, 8, 5, 2, 5, 6, np.inf], # this last subject is rightcensored. >>> 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], >>> 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], >>> 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> }) >>> >>> aft = WeibullAFTFitter() >>> aft.fit_interval_censoring(df, 'start', 'stop', 'E') >>> aft.print_summary() >>> aft.predict_median(df) >>> >>> aft = WeibullAFTFitter() >>> aft.fit_interval_censoring(df, 'start', 'stop', 'E', ancillary_df=df) >>> aft.print_summary() >>> aft.predict_median(df)
 df (DataFrame) – a Pandas DataFrame with necessary columns

fit_left_censoring
(df, duration_col=None, event_col=None, ancillary_df=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → ParametericAFTRegressionFitter¶ Fit the accelerated failure time model to a leftcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes/measurements/etc. This column contains the (possibly) leftcensored data.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 ancillary_df (None, boolean, or DataFrame, optional (default=None)) – Choose to model the ancillary parameters.
If None or False, explicitly do not fit the ancillary parameters using any covariates.
If True, model the ancillary parameters with the same covariates as
df
. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count asdf
.  fit_intercept (bool, optional) – If true, add a constant column to the regression. Overrides value set in class instantiation.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (str) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: self
Return type: self with additional new properties
print_summary
,params_
,confidence_intervals_
and moreExamples
>>> from lifelines import WeibullAFTFitter, LogNormalAFTFitter, LogLogisticAFTFitter >>> >>> df = pd.DataFrame({ >>> 'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], >>> 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], >>> 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> }) >>> >>> aft = WeibullAFTFitter() >>> aft.fit_left_censoring(df, 'T', 'E') >>> aft.print_summary() >>> aft.predict_median(df) >>> >>> aft = WeibullAFTFitter() >>> aft.fit_left_censoring(df, 'T', 'E', ancillary_df=df) >>> aft.print_summary() >>> aft.predict_median(df)

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

log_likelihood_ratio_test
()¶ This function computes the likelihood ratio test for the model. We compare the existing model (with all the covariates) to the trivial model of no covariates.

mean_survival_time_
¶ The mean survival time of the average subject in the training dataset.

median_survival_time_
¶ The median survival time of the average subject in the training dataset.

plot
(columns=None, parameter=None, ax=None, **errorbar_kwargs)¶ Produces a visual representation of the coefficients, including their standard errors and magnitudes.
Parameters:  columns (list, optional) – specify a subset of the columns to plot
 errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis

plot_covariate_groups
(covariates, values, plot_baseline=True, ax=None, **kwargs)¶ Produces a visual representation comparing the baseline survival curve of the model versus what happens when a covariate(s) is varied over values in a group. This is useful to compare subjects’ survival as we vary covariate(s), all else being held equal. The baseline survival curve is equal to the predicted survival curve at all average values in the original dataset.
Parameters:  covariates (string or list) – a string (or list of strings) of the covariate in the original dataset that we wish to vary.
 values (1d or 2d iterable) – an iterable of the values we wish the covariate to take on.
 plot_baseline (bool) – also display the baseline survival, defined as the survival at the mean of the original dataset.
 kwargs – pass in additional plotting commands
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis, or list of axis’
Examples
>>> from lifelines import datasets, WeibullAFTFitter >>> rossi = datasets.load_rossi() >>> wf = WeibullAFTFitter().fit(rossi, 'week', 'arrest') >>> wf.plot_covariate_groups('prio', values=np.arange(0, 15), cmap='coolwarm')
>>> # multiple variables at once >>> wf.plot_covariate_groups(['prio', 'paro'], values=[[0, 0], [5, 0], [10, 0], [0, 1], [5, 1], [10, 1]], cmap='coolwarm')
>>> # if you have categorical variables, you can simply things: >>> wf.plot_covariate_groups(['dummy1', 'dummy2', 'dummy3'], values=np.eye(3))

predict_cumulative_hazard
(df, *, ancillary_df=None, times=None, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Parameters:  df (DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved).
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.

predict_expectation
(df: pandas.core.frame.DataFrame, ancillary_df: Optional[pandas.core.frame.DataFrame] = None) → pandas.core.frame.DataFrame¶ Predict the expectation of lifetimes, \(E[T  x]\).
Parameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
Returns: percentiles – the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Return type: DataFrame
See also

predict_hazard
(df, *, ancillary_df=None, times=None, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Parameters:  df (DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved).
 conditional_after (iterable, optional) – Not implemented yet

predict_median
(df, *, ancillary_df=None, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Parameters:  df (DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
See also

predict_percentile
(df: pandas.core.frame.DataFrame, *, ancillary_df: Optional[pandas.core.frame.DataFrame] = None, p: float = 0.5, conditional_after: Optional[numpy.ndarray] = None) → pandas.core.frame.DataFrame¶ Returns the median lifetimes for the individuals, by default. If the survival curve of an individual does not cross
p
, then the result is infinity. http://stats.stackexchange.com/questions/102986/percentilelossfunctionsParameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 p (float, optional (default=0.5)) – the percentile, must be between 0 and 1.
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
Returns: percentiles
Return type: DataFrame
See also

predict_survival_function
(df, times=None, conditional_after=None, ancillary_df=None) → pandas.core.frame.DataFrame¶ Predict the survival function for individuals, given their covariates. This assumes that the individual just entered the study (that is, we do not condition on how long they have already lived for.)
Parameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times (iterable, optional) – an iterable of increasing times to predict the survival function at. Default is the set of all durations (observed and unobserved).
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.

print_summary
(decimals=2, style=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

score_
The concordance score (also known as the cindex) of the fit. The cindex is a generalization of the ROC AUC to survival data, including censorships.
For this purpose, the
score_
is a measure of the predictive accuracy of the fitted model onto the training dataset.

summary
¶ Summary statistics describing the fit.
See also
print_summary
LogLogisticAFTFitter¶

class
lifelines.fitters.log_logistic_aft_fitter.
LogLogisticAFTFitter
(alpha=0.05, penalizer=0.0, l1_ratio=0.0, fit_intercept=True, model_ancillary=False)¶ Bases:
lifelines.fitters.ParametericAFTRegressionFitter
This class implements a LogLogistic AFT model. The model has parameterized form, with \(\alpha(x) = \exp\left(a_0 + a_1x_1 + ... + a_n x_n \right)\), and optionally, \(\beta(y) = \exp\left(b_0 + b_1 y_1 + ... + b_m y_m \right)\),
The cumulative hazard rate is
\[H(t; x , y) = \log\left(1 + \left(\frac{t}{\alpha(x)}\right)^{\beta(y)}\right)\]The \(\alpha\) (scale) parameter has an interpretation as being equal to the median lifetime. The \(\beta\) parameter influences the shape of the hazard.
After calling the
.fit
method, you have access to properties like:params_
,print_summary()
. A summary of the fit is available with the methodprint_summary()
.Parameters:  alpha (float, optional (default=0.05)) – the level in the confidence intervals.
 fit_intercept (boolean, optional (default=True)) – Allow lifelines to add an intercept column of 1s to df, and ancillary_df if applicable.
 penalizer (float, optional (default=0.0)) – the penalizer coefficient to the size of the coefficients. See l1_ratio. Must be equal to or greater than 0.
 l1_ratio (float, optional (default=0.0)) – how much of the penalizer should be attributed to an l1 penalty (otherwise an l2 penalty). The penalty function looks like
penalizer * l1_ratio * w_1 + 0.5 * penalizer * (1  l1_ratio) * w^2_2
 model_ancillary (optional (default=False)) – set the model instance to always model the ancillary parameter with the supplied Dataframe. This is useful for gridsearch optimization.

params_
¶ The estimated coefficients
Type: DataFrame

confidence_intervals_
¶ The lower and upper confidence intervals for the coefficients
Type: DataFrame

durations
¶ The event_observed variable provided
Type: Series

event_observed
¶ The event_observed variable provided
Type: Series

weights
¶ The event_observed variable provided
Type: Series

variance_matrix_
¶ The variance matrix of the coefficients
Type: numpy array

standard_errors_
¶ the standard errors of the estimates
Type: Series

score_
¶ the concordance index of the model.
Type: float

compute_residuals
(df)¶

fit
(df, duration_col, event_col=None, ancillary_df=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → ParametericAFTRegressionFitter¶ Fit the accelerated failure time model to a rightcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 ancillary_df (None, boolean, or DataFrame, optional (default=None)) – Choose to model the ancillary parameters.
If None or False, explicitly do not fit the ancillary parameters using any covariates.
If True, model the ancillary parameters with the same covariates as
df
. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count asdf
.  fit_intercept (bool, optional) – If true, add a constant column to the regression. Overrides value set in class instantiation.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (string) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: Return type: self with additional new properties
print_summary
,params_
,confidence_intervals_
and moreExamples
>>> from lifelines import WeibullAFTFitter, LogNormalAFTFitter, LogLogisticAFTFitter >>> >>> df = pd.DataFrame({ >>> 'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], >>> 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], >>> 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> }) >>> >>> aft = WeibullAFTFitter() >>> aft.fit(df, 'T', 'E') >>> aft.print_summary() >>> aft.predict_median(df) >>> >>> aft = WeibullAFTFitter() >>> aft.fit(df, 'T', 'E', ancillary_df=df) >>> aft.print_summary() >>> aft.predict_median(df)

fit_interval_censoring
(df, lower_bound_col, upper_bound_col, event_col=None, ancillary_df=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → ParametericAFTRegressionFitter¶ Fit the accelerated failure time model to a intervalcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns
lower_bound_col
,upper_bound_col
(see below), and any other covariates or weights.  lower_bound_col (string) – the name of the column in DataFrame that contains the subjects’ leftmost observation.
 upper_bound_col (string) – the name of the column in DataFrame that contains the subjects’ rightmost observation. Values can be np.inf (and should be if the subject is rightcensored).
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, will be inferred from the start and stop columns (lower_bound==upper_bound means uncensored)
 ancillary_df (None, boolean, or DataFrame, optional (default=None)) – Choose to model the ancillary parameters.
If None or False, explicitly do not fit the ancillary parameters using any covariates.
If True, model the ancillary parameters with the same covariates as
df
. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count asdf
.  fit_intercept (bool, optional) – If true, add a constant column to the regression. Overrides value set in class instantiation.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (str) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: Return type: self with additional new properties
print_summary
,params_
,confidence_intervals_
and moreExamples
>>> from lifelines import WeibullAFTFitter, LogNormalAFTFitter, LogLogisticAFTFitter >>> >>> df = pd.DataFrame({ >>> 'start': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'stop': [5, 3, 9, 8, 7, 4, 8, 5, 2, 5, 6, np.inf], # this last subject is rightcensored. >>> 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], >>> 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], >>> 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> }) >>> >>> aft = WeibullAFTFitter() >>> aft.fit_interval_censoring(df, 'start', 'stop', 'E') >>> aft.print_summary() >>> aft.predict_median(df) >>> >>> aft = WeibullAFTFitter() >>> aft.fit_interval_censoring(df, 'start', 'stop', 'E', ancillary_df=df) >>> aft.print_summary() >>> aft.predict_median(df)
 df (DataFrame) – a Pandas DataFrame with necessary columns

fit_left_censoring
(df, duration_col=None, event_col=None, ancillary_df=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → ParametericAFTRegressionFitter¶ Fit the accelerated failure time model to a leftcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes/measurements/etc. This column contains the (possibly) leftcensored data.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 ancillary_df (None, boolean, or DataFrame, optional (default=None)) – Choose to model the ancillary parameters.
If None or False, explicitly do not fit the ancillary parameters using any covariates.
If True, model the ancillary parameters with the same covariates as
df
. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count asdf
.  fit_intercept (bool, optional) – If true, add a constant column to the regression. Overrides value set in class instantiation.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (str) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: self
Return type: self with additional new properties
print_summary
,params_
,confidence_intervals_
and moreExamples
>>> from lifelines import WeibullAFTFitter, LogNormalAFTFitter, LogLogisticAFTFitter >>> >>> df = pd.DataFrame({ >>> 'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], >>> 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], >>> 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], >>> }) >>> >>> aft = WeibullAFTFitter() >>> aft.fit_left_censoring(df, 'T', 'E') >>> aft.print_summary() >>> aft.predict_median(df) >>> >>> aft = WeibullAFTFitter() >>> aft.fit_left_censoring(df, 'T', 'E', ancillary_df=df) >>> aft.print_summary() >>> aft.predict_median(df)

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

log_likelihood_ratio_test
()¶ This function computes the likelihood ratio test for the model. We compare the existing model (with all the covariates) to the trivial model of no covariates.

mean_survival_time_
¶ The mean survival time of the average subject in the training dataset.

median_survival_time_
¶ The median survival time of the average subject in the training dataset.

plot
(columns=None, parameter=None, ax=None, **errorbar_kwargs)¶ Produces a visual representation of the coefficients, including their standard errors and magnitudes.
Parameters:  columns (list, optional) – specify a subset of the columns to plot
 errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis

plot_covariate_groups
(covariates, values, plot_baseline=True, ax=None, **kwargs)¶ Produces a visual representation comparing the baseline survival curve of the model versus what happens when a covariate(s) is varied over values in a group. This is useful to compare subjects’ survival as we vary covariate(s), all else being held equal. The baseline survival curve is equal to the predicted survival curve at all average values in the original dataset.
Parameters:  covariates (string or list) – a string (or list of strings) of the covariate in the original dataset that we wish to vary.
 values (1d or 2d iterable) – an iterable of the values we wish the covariate to take on.
 plot_baseline (bool) – also display the baseline survival, defined as the survival at the mean of the original dataset.
 kwargs – pass in additional plotting commands
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis, or list of axis’
Examples
>>> from lifelines import datasets, WeibullAFTFitter >>> rossi = datasets.load_rossi() >>> wf = WeibullAFTFitter().fit(rossi, 'week', 'arrest') >>> wf.plot_covariate_groups('prio', values=np.arange(0, 15), cmap='coolwarm')
>>> # multiple variables at once >>> wf.plot_covariate_groups(['prio', 'paro'], values=[[0, 0], [5, 0], [10, 0], [0, 1], [5, 1], [10, 1]], cmap='coolwarm')
>>> # if you have categorical variables, you can simply things: >>> wf.plot_covariate_groups(['dummy1', 'dummy2', 'dummy3'], values=np.eye(3))

predict_cumulative_hazard
(df, *, ancillary_df=None, times=None, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Parameters:  df (DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved).
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.

predict_expectation
(df, ancillary_df=None)¶ Predict the expectation of lifetimes, \(E[T  x]\).
Parameters:  X (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 ancillary_X (DataFrame, optional) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
Returns: percentiles – the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Return type: DataFrame
See also

predict_hazard
(df, *, ancillary_df=None, times=None, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Parameters:  df (DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved).
 conditional_after (iterable, optional) – Not implemented yet

predict_median
(df, *, ancillary_df=None, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Parameters:  df (DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
See also

predict_percentile
(df, ancillary_df=None, p=0.5, conditional_after=None)¶ Returns the median lifetimes for the individuals, by default. If the survival curve of an individual does not cross
p
, then the result is infinity. http://stats.stackexchange.com/questions/102986/percentilelossfunctionsParameters:  X (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 ancillary_X (DataFrame, optional) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 p (float, optional (default=0.5)) – the percentile, must be between 0 and 1.
Returns: percentiles
Return type: DataFrame
See also

predict_survival_function
(df, times=None, conditional_after=None, ancillary_df=None) → pandas.core.frame.DataFrame¶ Predict the survival function for individuals, given their covariates. This assumes that the individual just entered the study (that is, we do not condition on how long they have already lived for.)
Parameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times (iterable, optional) – an iterable of increasing times to predict the survival function at. Default is the set of all durations (observed and unobserved).
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.

print_summary
(decimals=2, style=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

score_
The concordance score (also known as the cindex) of the fit. The cindex is a generalization of the ROC AUC to survival data, including censorships.
For this purpose, the
score_
is a measure of the predictive accuracy of the fitted model onto the training dataset.

summary
¶ Summary statistics describing the fit.
See also
print_summary
PiecewiseExponentialRegressionFitter¶

class
lifelines.fitters.piecewise_exponential_regression_fitter.
PiecewiseExponentialRegressionFitter
(breakpoints, alpha=0.05, penalizer=0.0)¶ Bases:
lifelines.fitters.ParametricRegressionFitter
This implements a piecewise constanthazard model at prespecified break points.
\[\begin{split}h(t) = \begin{cases} 1/\lambda_0(x) & \text{if $t \le \tau_0$} \\ 1/\lambda_1(x) & \text{if $\tau_0 < t \le \tau_1$} \\ 1/\lambda_2(x) & \text{if $\tau_1 < t \le \tau_2$} \\ ... \end{cases}\end{split}\]where \(\lambda_i(x) = \exp{\beta_i x}\).
Parameters:  breakpoints (list) – a list of times when a new exponential model is constructed.
 penalizer (float) – penalize the variance of the \(\lambda_i\). See blog post below.
 alpha (float, optional (default=0.05)) – the level in the confidence intervals.
Examples
See blog post here and paper replication here

fit
(df, duration_col, event_col=None, regressors=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → ParametricRegressionFitter¶ Fit the regression model to a rightcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 regressors (dict, optional) – a dictionary of parameter names > list of column names that maps model parameters to a linear combination of variables. If left as None, all variables will be used for all parameters.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (string) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: self with additional new properties
Return type: print_summary
,params_
,confidence_intervals_
and more

fit_interval_censoring
(df, lower_bound_col, upper_bound_col, event_col=None, ancillary_df=None, regressors=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None)¶ Fit the regression model to a rightcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 lower_bound_col (string) – the name of the column in DataFrame that contains the lower bounds of the intervals.
 upper_bound_col (string) – the name of the column in DataFrame that contains the upper bounds of the intervals.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, this is inferred based on the upper and lower interval limits (equal implies observed death.)
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 regressors (dict, optional) – a dictionary of parameter names > list of column names that maps model parameters to a linear combination of variables. If left as None, all variables will be used for all parameters.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (string) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: self with additional new properties
Return type: print_summary
,params_
,confidence_intervals_
and more

fit_left_censoring
(df, duration_col=None, event_col=None, regressors=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → ParametericAFTRegressionFitter¶ Fit the accelerated failure time model to a leftcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes/measurements/etc. This column contains the (possibly) leftcensored data.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 fit_intercept (bool, optional) – If true, add a constant column to the regression. Overrides value set in class instantiation.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 regressors (dict, optional) – a dictionary of parameter names > list of column names that maps model parameters to a linear combination of variables. If left as None, all variables will be used for all parameters.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (str) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: Return type: self with additional new properties
print_summary
,params_
,confidence_intervals_
and more

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

log_likelihood_ratio_test
()¶ This function computes the likelihood ratio test for the model. We compare the existing model (with all the covariates) to the trivial model of no covariates.

mean_survival_time_
¶ The mean survival time of the average subject in the training dataset.

median_survival_time_
¶ The median survival time of the average subject in the training dataset.

plot
(columns=None, parameter=None, ax=None, **errorbar_kwargs)¶ Produces a visual representation of the coefficients, including their standard errors and magnitudes.
Parameters:  columns (list, optional) – specify a subset of the columns to plot
 errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis

plot_covariate_groups
(covariates, values, plot_baseline=True, ax=None, **kwargs)¶ Produces a plot comparing the baseline survival curve of the model versus what happens when a covariate(s) is varied over values in a group. This is useful to compare subjects’ survival as we vary covariate(s), all else being held equal. The baseline survival curve is equal to the predicted survival curve at all average values in the original dataset.
Parameters:  covariates (string or list) – a string (or list of strings) of the covariate in the original dataset that we wish to vary.
 values (1d or 2d iterable) – an iterable of the values we wish the covariate to take on.
 plot_baseline (bool) – also display the baseline survival, defined as the survival at the mean of the original dataset.
 kwargs – pass in additional plotting commands
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis, or list of axis’
Examples
>>> from lifelines import datasets, WeibullAFTFitter >>> rossi = datasets.load_rossi() >>> wf = WeibullAFTFitter().fit(rossi, 'week', 'arrest') >>> wf.plot_covariate_groups('prio', values=np.arange(0, 15, 3), cmap='coolwarm')
>>> # multiple variables at once >>> wf.plot_covariate_groups(['prio', 'paro'], values=[[0, 0], [5, 0], [10, 0], [0, 1], [5, 1], [10, 1]], cmap='coolwarm')
>>> # if you have categorical variables, you can simply things: >>> wf.plot_covariate_groups(['dummy1', 'dummy2', 'dummy3'], values=np.eye(3))

predict_cumulative_hazard
(df, times=None, conditional_after=None)¶ Return the cumulative hazard rate of subjects in X at time points.
Parameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index.
Returns: cumulative_hazard_ – the cumulative hazard of individuals over the timeline
Return type: DataFrame

predict_expectation
(X) → pandas.core.frame.DataFrame¶ Compute the expected lifetime, \(E[T]\), using covariates X. This algorithm to compute the expectation is to use the fact that \(E[T] = \int_0^\inf P(T > t) dt = \int_0^\inf S(t) dt\). To compute the integral, we use the trapizoidal rule to approximate the integral.
Caution
If the survival function doesn’t converge to 0, the the expectation is really infinity and the returned values are meaningless/too large. In that case, using
predict_median
orpredict_percentile
would be better.Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. Returns: expectations Return type: DataFrame Notes
If X is a DataFrame, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.
See also

predict_hazard
(df, *, times=None)¶ Predict the hazard for individuals, given their covariates.
Parameters:  df (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order.
 times (iterable, optional) – an iterable (array, list, series) of increasing times to predict the cumulative hazard at. Default is the set of all durations in the training dataset (observed and unobserved).
 conditional_after – Not implemented yet.
Returns: the hazards of individuals over the timeline
Return type: DataFrame

predict_median
(df, *, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Parameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order.
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
Returns: percentiles – the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Return type: DataFrame
See also

predict_percentile
(df, *, p=0.5, conditional_after=None) → pandas.core.frame.DataFrame¶

predict_survival_function
(df, times=None, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the survival function for individuals, given their covariates. This assumes that the individual just entered the study (that is, we do not condition on how long they have already lived for.)
Parameters:  df (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order.
 times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index.
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects.
Returns: survival_function – the survival probabilities of individuals over the timeline
Return type: DataFrame

print_summary
(decimals=2, style=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

score_
¶ The concordance score (also known as the cindex) of the fit. The cindex is a generalization of the ROC AUC to survival data, including censorships.
For this purpose, the
score_
is a measure of the predictive accuracy of the fitted model onto the training dataset.

summary
¶ Summary statistics describing the fit.
See also
print_summary
GeneralizedGammaRegressionFitter¶

class
lifelines.fitters.generalized_gamma_regression_fitter.
GeneralizedGammaRegressionFitter
(alpha=0.05, penalizer=0.0)¶ Bases:
lifelines.fitters.ParametricRegressionFitter
This class implements a Generalized Gamma model for regression data. The model has parameterized form:
The survival function is:
\[\begin{split}S(t; x)=\left\{ \begin{array}{} 1\Gamma_{RL}\left( \frac{1}{{{\lambda }^{2}}};\frac{{e}^{\lambda \left( \frac{\log(t)\mu }{\sigma} \right)}}{\lambda ^{2}} \right) \textit{ if } \lambda> 0 \\ \Gamma_{RL}\left( \frac{1}{{{\lambda }^{2}}};\frac{{e}^{\lambda \left( \frac{\log(t)\mu }{\sigma} \right)}}{\lambda ^{2}} \right) \textit{ if } \lambda \le 0 \\ \end{array} \right.\,\!\end{split}\]where \(\Gamma_{RL}\) is the regularized lower incomplete Gamma function, and \(\sigma = \sigma(x) = \exp(\alpha x^T), \lambda = \lambda(x) = \beta x^T, \mu = \mu(x) = \gamma x^T\).
This model has the Exponential, Weibull, Gamma and LogNormal as submodels, and thus can be used as a way to test which model to use:
 When \(\lambda = 1\) and \(\sigma = 1\), then the data is Exponential.
 When \(\lambda = 1\) then the data is Weibull.
 When \(\sigma = \lambda\) then the data is Gamma.
 When \(\lambda = 0\) then the data is LogNormal.
 When \(\lambda = 1\) then the data is InverseWeibull.
 When \(\sigma = \lambda\) then the data is InverseGamma.
After calling the
.fit
method, you have access to properties like:cumulative_hazard_
,survival_function_
, A summary of the fit is available with the methodprint_summary()
.Important
The parameterization implemented has \(\log\sigma\), thus there is a ln_sigma_ in the output. Exponentiate this parameter to recover \(\sigma\).
Important
This model is experimental. It’s API may change in the future. Also, it’s convergence is not very stable.
Parameters: alpha (float, optional (default=0.05)) – the level in the confidence intervals. Examples
>>> from lifelines import GeneralizedGammaFitter >>> from lifelines.datasets import load_waltons >>> waltons = load_waltons() >>> ggf = GeneralizedGammaFitter() >>> ggf.fit(waltons['T'], waltons['E']) >>> ggf.plot() >>> ggf.summary

cumulative_hazard_
¶ The estimated cumulative hazard (with custom timeline if provided)
Type: DataFrame

hazard_
¶ The estimated hazard (with custom timeline if provided)
Type: DataFrame

survival_function_
¶ The estimated survival function (with custom timeline if provided)
Type: DataFrame

cumumlative_density_
¶ The estimated cumulative density function (with custom timeline if provided)
Type: DataFrame

variance_matrix_
¶ The variance matrix of the coefficients
Type: numpy array

median_
¶ The median time to event
Type: float

lambda_
¶ The fitted parameter in the model
Type: float

rho_
¶ The fitted parameter in the model
Type: float

alpha_
¶ The fitted parameter in the model
Type: float

durations
¶ The durations provided
Type: array

event_observed
¶ The event_observed variable provided
Type: array

timeline
¶ The time line to use for plotting and indexing
Type: array

entry
¶ The entry array provided, or None
Type: array or None

fit
(df, duration_col, event_col=None, regressors=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → ParametricRegressionFitter¶ Fit the regression model to a rightcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 regressors (dict, optional) – a dictionary of parameter names > list of column names that maps model parameters to a linear combination of variables. If left as None, all variables will be used for all parameters.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (string) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: self with additional new properties
Return type: print_summary
,params_
,confidence_intervals_
and more

fit_interval_censoring
(df, lower_bound_col, upper_bound_col, event_col=None, ancillary_df=None, regressors=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None)¶ Fit the regression model to a rightcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 lower_bound_col (string) – the name of the column in DataFrame that contains the lower bounds of the intervals.
 upper_bound_col (string) – the name of the column in DataFrame that contains the upper bounds of the intervals.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, this is inferred based on the upper and lower interval limits (equal implies observed death.)
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 regressors (dict, optional) – a dictionary of parameter names > list of column names that maps model parameters to a linear combination of variables. If left as None, all variables will be used for all parameters.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (string) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: self with additional new properties
Return type: print_summary
,params_
,confidence_intervals_
and more

fit_left_censoring
(df, duration_col=None, event_col=None, regressors=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → ParametericAFTRegressionFitter¶ Fit the accelerated failure time model to a leftcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes/measurements/etc. This column contains the (possibly) leftcensored data.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 fit_intercept (bool, optional) – If true, add a constant column to the regression. Overrides value set in class instantiation.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 regressors (dict, optional) – a dictionary of parameter names > list of column names that maps model parameters to a linear combination of variables. If left as None, all variables will be used for all parameters.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (str) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: Return type: self with additional new properties
print_summary
,params_
,confidence_intervals_
and more

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

log_likelihood_ratio_test
()¶ This function computes the likelihood ratio test for the model. We compare the existing model (with all the covariates) to the trivial model of no covariates.

mean_survival_time_
¶ The mean survival time of the average subject in the training dataset.

median_survival_time_
¶ The median survival time of the average subject in the training dataset.

plot
(columns=None, parameter=None, ax=None, **errorbar_kwargs)¶ Produces a visual representation of the coefficients, including their standard errors and magnitudes.
Parameters:  columns (list, optional) – specify a subset of the columns to plot
 errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis

plot_covariate_groups
(covariates, values, plot_baseline=True, ax=None, **kwargs)¶ Produces a plot comparing the baseline survival curve of the model versus what happens when a covariate(s) is varied over values in a group. This is useful to compare subjects’ survival as we vary covariate(s), all else being held equal. The baseline survival curve is equal to the predicted survival curve at all average values in the original dataset.
Parameters:  covariates (string or list) – a string (or list of strings) of the covariate in the original dataset that we wish to vary.
 values (1d or 2d iterable) – an iterable of the values we wish the covariate to take on.
 plot_baseline (bool) – also display the baseline survival, defined as the survival at the mean of the original dataset.
 kwargs – pass in additional plotting commands
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis, or list of axis’
Examples
>>> from lifelines import datasets, WeibullAFTFitter >>> rossi = datasets.load_rossi() >>> wf = WeibullAFTFitter().fit(rossi, 'week', 'arrest') >>> wf.plot_covariate_groups('prio', values=np.arange(0, 15, 3), cmap='coolwarm')
>>> # multiple variables at once >>> wf.plot_covariate_groups(['prio', 'paro'], values=[[0, 0], [5, 0], [10, 0], [0, 1], [5, 1], [10, 1]], cmap='coolwarm')
>>> # if you have categorical variables, you can simply things: >>> wf.plot_covariate_groups(['dummy1', 'dummy2', 'dummy3'], values=np.eye(3))

predict_cumulative_hazard
(df, *, times=None, conditional_after=None)¶ Predict the cumulative hazard for individuals, given their covariates.
Parameters:  df (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order.
 times (iterable, optional) – an iterable (array, list, series) of increasing times to predict the cumulative hazard at. Default is the set of all durations in the training dataset (observed and unobserved).
 conditional_after (iterable, optional) – Must be equal is size to (df.shape[0],) (n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
Returns: the cumulative hazards of individuals over the timeline
Return type: DataFrame

predict_expectation
(X) → pandas.core.frame.DataFrame¶ Compute the expected lifetime, \(E[T]\), using covariates X. This algorithm to compute the expectation is to use the fact that \(E[T] = \int_0^\inf P(T > t) dt = \int_0^\inf S(t) dt\). To compute the integral, we use the trapizoidal rule to approximate the integral.
Caution
If the survival function doesn’t converge to 0, the the expectation is really infinity and the returned values are meaningless/too large. In that case, using
predict_median
orpredict_percentile
would be better.Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. Returns: expectations Return type: DataFrame Notes
If X is a DataFrame, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.
See also

predict_hazard
(df, *, times=None)¶ Predict the hazard for individuals, given their covariates.
Parameters:  df (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order.
 times (iterable, optional) – an iterable (array, list, series) of increasing times to predict the cumulative hazard at. Default is the set of all durations in the training dataset (observed and unobserved).
 conditional_after – Not implemented yet.
Returns: the hazards of individuals over the timeline
Return type: DataFrame

predict_median
(df, *, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Parameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order.
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
Returns: percentiles – the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Return type: DataFrame
See also

predict_percentile
(df, *, p=0.5, conditional_after=None) → pandas.core.frame.DataFrame¶

predict_survival_function
(df, times=None, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the survival function for individuals, given their covariates. This assumes that the individual just entered the study (that is, we do not condition on how long they have already lived for.)
Parameters:  df (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order.
 times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index.
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects.
Returns: survival_function – the survival probabilities of individuals over the timeline
Return type: DataFrame

print_summary
(decimals=2, style=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

score_
¶ The concordance score (also known as the cindex) of the fit. The cindex is a generalization of the ROC AUC to survival data, including censorships.
For this purpose, the
score_
is a measure of the predictive accuracy of the fitted model onto the training dataset.

summary
¶ Summary statistics describing the fit.
See also
print_summary