WeibullAFTFitter¶

class
lifelines.fitters.weibull_aft_fitter.
WeibullAFTFitter
(alpha: float = 0.05, penalizer: float = 0.0, l1_ratio: float = 0.0, fit_intercept: bool = True, model_ancillary: bool = False)¶ Bases:
lifelines.fitters.ParametericAFTRegressionFitter
,lifelines.fitters.mixins.ProportionalHazardMixin
This class implements a Weibull AFT model. The model has parameterized form, with \(\lambda(x) = \exp\left(\beta_0 + \beta_1x_1 + ... + \beta_n x_n \right)\), and optionally, \(\rho(y) = \exp\left(\alpha_0 + \alpha_1 y_1 + ... + \alpha_m y_m \right)\),
\[S(t; x, y) = \exp\left(\left(\frac{t}{\lambda(x)}\right)^{\rho(y)}\right),\]With no covariates, the Weibull model’s parameters has the following interpretations: The \(\lambda\) (scale) parameter has an applicable interpretation: it represent the time when 37% of the population has died. The \(\rho\) (shape) parameter controls if the cumulative hazard (see below) is convex or concave, representing accelerating or decelerating hazards.
The cumulative hazard rate is
\[H(t; x, y) = \left(\frac{t}{\lambda(x)} \right)^{\rho(y)},\]After calling the
.fit
method, you have access to properties like:params_
,print_summary()
. A summary of the fit is available with the methodprint_summary()
.Parameters:  alpha (float, optional (default=0.05)) – the level in the confidence intervals.
 fit_intercept (boolean, optional (default=True)) – Allow lifelines to add an intercept column of 1s to df, and ancillary_df if applicable.
 penalizer (float or array, optional (default=0.0)) – the penalizer coefficient to the size of the coefficients. See l1_ratio. Must be equal to or greater than 0. Alternatively, penalizer is an array equal in size to the number of parameters, with penalty coefficients for specific variables. For example, penalizer=0.01 * np.ones(p) is the same as penalizer=0.01
 l1_ratio (float, optional (default=0.0)) – how much of the penalizer should be attributed to an l1 penalty (otherwise an l2 penalty). The penalty function looks like
penalizer * l1_ratio * w_1 + 0.5 * penalizer * (1  l1_ratio) * w^2_2
 model_ancillary (optional (default=False)) – set the model instance to always model the ancillary parameter with the supplied Dataframe. This is useful for gridsearch optimization.

params_
¶ The estimated coefficients
Type: DataFrame

confidence_intervals_
¶ The lower and upper confidence intervals for the coefficients
Type: DataFrame

durations
¶ The event_observed variable provided
Type: Series

event_observed
¶ The event_observed variable provided
Type: Series

weights
¶ The event_observed variable provided
Type: Series

variance_matrix_
¶ The variance matrix of the coefficients
Type: numpy array

standard_errors_
¶ the standard errors of the estimates
Type: Series

score_
¶ the concordance index of the model.
Type: float

AIC_
¶

check_assumptions
(training_df: pandas.core.frame.DataFrame, advice: bool = True, show_plots: bool = False, p_value_threshold: float = 0.01, plot_n_bootstraps: int = 10, columns: Optional[List[str]] = None) → None¶ Use this function to test the proportional hazards assumption. See usage example at https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html
Parameters:  training_df (DataFrame) – the original DataFrame used in the call to
fit(...)
or a subsampled version.  advice (bool, optional) – display advice as output to the user’s screen
 show_plots (bool, optional) – display plots of the scaled schoenfeld residuals and loess curves. This is an eyeball test for violations. This will slow down the function significantly.
 p_value_threshold (float, optional) – the threshold to use to alert the user of violations. See note below.
 plot_n_bootstraps – in the plots displayed, also display plot_n_bootstraps bootstrapped loess curves. This will slow down the function significantly.
 columns (list, optional) – specify a subset of columns to test.
Examples
from lifelines.datasets import load_rossi from lifelines import CoxPHFitter rossi = load_rossi() cph = CoxPHFitter().fit(rossi, 'week', 'arrest') cph.check_assumptions(rossi)
Notes
The
p_value_threshold
is arbitrarily set at 0.01. Under the null, some covariates will be below the threshold (i.e. by chance). This is compounded when there are many covariates.Similarly, when there are lots of observations, even minor deviances from the proportional hazard assumption will be flagged.
With that in mind, it’s best to use a combination of statistical tests and eyeball tests to determine the most serious violations.
References
section 5 in https://socialsciences.mcmaster.ca/jfox/Books/Companion/appendices/AppendixCoxRegression.pdf, http://www.mwsug.org/proceedings/2006/stats/MWSUG2006SD08.pdf, http://eprints.lse.ac.uk/84988/1/06_ParkHendry2015ReassessingSchoenfeldTests_Final.pdf
 training_df (DataFrame) – the original DataFrame used in the call to

compute_followup_hazard_ratios
(training_df: pandas.core.frame.DataFrame, followup_times: Iterable[T_co]) → pandas.core.frame.DataFrame¶ Recompute the hazard ratio at different followup times (lifelines handles accounting for updated censoring and updated durations). This is useful because we need to remember that the hazard ratio is actually a weightedaverage of periodspecific hazard ratios.
Parameters:  training_df (pd.DataFrame) – The same dataframe used to train the model
 followup_times (Iterable) – a list/array of followup times to recompute the hazard ratio at.

compute_residuals
(training_dataframe: pandas.core.frame.DataFrame, kind: str) → pandas.core.frame.DataFrame¶ Compute the residuals the model.
Parameters:  training_dataframe (DataFrame) – the same training DataFrame given in fit
 kind (string) – One of {‘schoenfeld’, ‘score’, ‘delta_beta’, ‘deviance’, ‘martingale’, ‘scaled_schoenfeld’}
Notes
'scaled_schoenfeld'
: lifelines does not add the coefficients to the final results, but R does when you callresiduals(c, "scaledsch")

concordance_index_
¶ The concordance score (also known as the cindex) of the fit. The cindex is a generalization of the ROC AUC to survival data, including censorships. For this purpose, the
concordance_index_
is a measure of the predictive accuracy of the fitted model onto the training dataset.

fit
(df, duration_col, event_col=None, ancillary_df=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → self¶ Fit the accelerated failure time model to a rightcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 ancillary_df (None, boolean, or DataFrame, optional (default=None)) – Choose to model the ancillary parameters.
If None or False, explicitly do not fit the ancillary parameters using any covariates.
If True, model the ancillary parameters with the same covariates as
df
. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count asdf
.  fit_intercept (bool, optional) – If true, add a constant column to the regression. Overrides value set in class instantiation.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (string) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: Return type: self with additional new properties
print_summary
,params_
,confidence_intervals_
and moreExamples
from lifelines import WeibullAFTFitter, LogNormalAFTFitter, LogLogisticAFTFitter df = pd.DataFrame({ 'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], }) aft = WeibullAFTFitter() aft.fit(df, 'T', 'E') aft.print_summary() aft.predict_median(df) aft = WeibullAFTFitter() aft.fit(df, 'T', 'E', ancillary_df=df) aft.print_summary() aft.predict_median(df)

fit_interval_censoring
(df, lower_bound_col, upper_bound_col, event_col=None, ancillary_df=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → self¶ Fit the accelerated failure time model to a intervalcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns
lower_bound_col
,upper_bound_col
(see below), and any other covariates or weights.  lower_bound_col (string) – the name of the column in DataFrame that contains the subjects’ leftmost observation.
 upper_bound_col (string) – the name of the column in DataFrame that contains the subjects’ rightmost observation. Values can be np.inf (and should be if the subject is rightcensored).
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, will be inferred from the start and stop columns (lower_bound==upper_bound means uncensored)
 ancillary_df (None, boolean, or DataFrame, optional (default=None)) – Choose to model the ancillary parameters.
If None or False, explicitly do not fit the ancillary parameters using any covariates.
If True, model the ancillary parameters with the same covariates as
df
. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count asdf
.  fit_intercept (bool, optional) – If true, add a constant column to the regression. Overrides value set in class instantiation.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (str) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: Return type: self with additional new properties
print_summary
,params_
,confidence_intervals_
and moreExamples
from lifelines import WeibullAFTFitter, LogNormalAFTFitter, LogLogisticAFTFitter df = pd.DataFrame({ 'start': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], 'stop': [5, 3, 9, 8, 7, 4, 8, 5, 2, 5, 6, np.inf], # this last subject is rightcensored. 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], }) aft = WeibullAFTFitter() aft.fit_interval_censoring(df, 'start', 'stop', 'E') aft.print_summary() aft.predict_median(df) aft = WeibullAFTFitter() aft.fit_interval_censoring(df, 'start', 'stop', 'E', ancillary_df=df) aft.print_summary() aft.predict_median(df)
 df (DataFrame) – a Pandas DataFrame with necessary columns

fit_left_censoring
(df, duration_col=None, event_col=None, ancillary_df=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → self¶ Fit the accelerated failure time model to a leftcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes/measurements/etc. This column contains the (possibly) leftcensored data.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 ancillary_df (None, boolean, or DataFrame, optional (default=None)) – Choose to model the ancillary parameters.
If None or False, explicitly do not fit the ancillary parameters using any covariates.
If True, model the ancillary parameters with the same covariates as
df
. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count asdf
.  fit_intercept (bool, optional) – If true, add a constant column to the regression. Overrides value set in class instantiation.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (str) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: self
Return type: self with additional new properties
print_summary
,params_
,confidence_intervals_
and moreExamples
from lifelines import WeibullAFTFitter, LogNormalAFTFitter, LogLogisticAFTFitter df = pd.DataFrame({ 'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], }) aft = WeibullAFTFitter() aft.fit_left_censoring(df, 'T', 'E') aft.print_summary() aft.predict_median(df) aft = WeibullAFTFitter() aft.fit_left_censoring(df, 'T', 'E', ancillary_df=df) aft.print_summary() aft.predict_median(df)

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

hazard_ratios_
¶

log_likelihood_ratio_test
()¶ This function computes the likelihood ratio test for the model. We compare the existing model (with all the covariates) to the trivial model of no covariates.

mean_survival_time_
¶ The mean survival time of the average subject in the training dataset.

median_survival_time_
¶ The median survival time of the average subject in the training dataset.

plot
(columns=None, parameter=None, ax=None, **errorbar_kwargs)¶ Produces a visual representation of the coefficients, including their standard errors and magnitudes.
Parameters:  columns (list, optional) – specify a subset of the columns to plot
 errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis

plot_covariate_groups
(covariates, values, plot_baseline=True, ax=None, times=None, **kwargs)¶ Produces a visual representation comparing the baseline survival curve of the model versus what happens when a covariate(s) is varied over values in a group. This is useful to compare subjects’ survival as we vary covariate(s), all else being held equal. The baseline survival curve is equal to the predicted survival curve at all average values in the original dataset.
Parameters:  covariates (string or list) – a string (or list of strings) of the covariate in the original dataset that we wish to vary.
 values (1d or 2d iterable) – an iterable of the values we wish the covariate to take on.
 plot_baseline (bool) – also display the baseline survival, defined as the survival at the mean of the original dataset.
 times (iterable) – pass in a times to plot
 kwargs – pass in additional plotting commands
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis, or list of axis’
Examples
from lifelines import datasets, WeibullAFTFitter rossi = datasets.load_rossi() wf = WeibullAFTFitter().fit(rossi, 'week', 'arrest') wf.plot_covariate_groups('prio', values=np.arange(0, 15), cmap='coolwarm') # multiple variables at once wf.plot_covariate_groups(['prio', 'paro'], values=[[0, 0], [5, 0], [10, 0], [0, 1], [5, 1], [10, 1]], cmap='coolwarm') # if you have categorical variables, you can simply things: wf.plot_covariate_groups(['dummy1', 'dummy2', 'dummy3'], values=np.eye(3))

predict_cumulative_hazard
(df, *, ancillary_df=None, times=None, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the cumulative hazard for the individuals.
Parameters:  df (DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved).
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.

predict_expectation
(df: pandas.core.frame.DataFrame, ancillary_df: Optional[pandas.core.frame.DataFrame] = None) → pandas.core.series.Series¶ Predict the expectation of lifetimes, \(E[T  x]\).
Parameters:  df (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 ancillary_df (DataFrame, optional) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
Returns: percentiles – the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Return type: DataFrame
See also

predict_hazard
(df, *, ancillary_df=None, times=None, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Parameters:  df (DataFrame) – a (n,d) covariate numpy array, Series, or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved).
 conditional_after (iterable, optional) – Not implemented yet

predict_median
(df, *, ancillary_df=None, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Parameters:  df (DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
See also

predict_percentile
(df: pandas.core.frame.DataFrame, *, ancillary_df: Optional[pandas.core.frame.DataFrame] = None, p: float = 0.5, conditional_after: Optional[autograd.numpy.numpy_wrapper.array] = None) → pandas.core.series.Series¶ Returns the median lifetimes for the individuals, by default. If the survival curve of an individual does not cross 0.5, then the result is infinity. http://stats.stackexchange.com/questions/102986/percentilelossfunctions
Parameters:  df (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 ancillary_df (DataFrame, optional) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 p (float, optional (default=0.5)) – the percentile, must be between 0 and 1.
Returns: percentiles
Return type: DataFrame
See also

predict_survival_function
(df, times=None, conditional_after=None, ancillary_df=None) → pandas.core.frame.DataFrame¶ Predict the survival function for individuals, given their covariates. This assumes that the individual just entered the study (that is, we do not condition on how long they have already lived for.)
Parameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 ancillary_X (numpy array or DataFrame, optional) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
 times (iterable, optional) – an iterable of increasing times to predict the survival function at. Default is the set of all durations (observed and unobserved).
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.

print_summary
(decimals=2, style=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

score
(df: pandas.core.frame.DataFrame, scoring_method: str = 'log_likelihood') → float¶ Score the data in df on the fitted model. With default scoring method, returns the _average loglikelihood_.
Parameters:  df (DataFrame) – the dataframe with duration col, event col, etc.
 scoring_method (str) – one of {‘log_likelihood’, ‘concordance_index’} log_likelihood: returns the average unpenalized loglikelihood. concordance_index: returns the concordanceindex
Examples
from lifelines import WeibullAFTFitter from lifelines.datasets import load_rossi rossi_train = load_rossi().loc[:400] rossi_test = load_rossi().loc[400:] wf = WeibullAFTFitter().fit(rossi_train, 'week', 'arrest') wf.score(rossi_train) wf.score(rossi_test)

summary
¶ Summary statistics describing the fit.
See also
print_summary