LogLogisticAFTFitter

class lifelines.fitters.log_logistic_aft_fitter.LogLogisticAFTFitter(alpha=0.05, penalizer=0.0, l1_ratio=0.0, fit_intercept=True, model_ancillary=False)

Bases: lifelines.fitters.ParametericAFTRegressionFitter

This class implements a Log-Logistic AFT model. The model has parameterized form, with \(\alpha(x) = \exp\left(a_0 + a_1x_1 + ... + a_n x_n \right)\), and optionally, \(\beta(y) = \exp\left(b_0 + b_1 y_1 + ... + b_m y_m \right)\),

The cumulative hazard rate is

\[H(t; x , y) = \log\left(1 + \left(\frac{t}{\alpha(x)}\right)^{\beta(y)}\right)\]

The \(\alpha\) (scale) parameter has an interpretation as being equal to the median lifetime. The \(\beta\) parameter influences the shape of the hazard.

After calling the .fit method, you have access to properties like: params_, print_summary(). A summary of the fit is available with the method print_summary().

Parameters:
  • alpha (float, optional (default=0.05)) – the level in the confidence intervals.
  • fit_intercept (boolean, optional (default=True)) – Allow lifelines to add an intercept column of 1s to df, and ancillary if applicable.
  • penalizer (float or array, optional (default=0.0)) – the penalizer coefficient to the size of the coefficients. See l1_ratio. Must be equal to or greater than 0. Alternatively, penalizer is an array equal in size to the number of parameters, with penalty coefficients for specific variables. For example, penalizer=0.01 * np.ones(p) is the same as penalizer=0.01
l1_ratio: float, optional (default=0.0)
how much of the penalizer should be attributed to an l1 penalty (otherwise an l2 penalty). The penalty function looks like penalizer * l1_ratio * ||w||_1 + 0.5 * penalizer * (1 - l1_ratio) * ||w||^2_2
model_ancillary: optional (default=False)
set the model instance to always model the ancillary parameter with the supplied Dataframe. This is useful for grid-search optimization.
params_

The estimated coefficients

Type:DataFrame
confidence_intervals_

The lower and upper confidence intervals for the coefficients

Type:DataFrame
durations

The event_observed variable provided

Type:Series
event_observed

The event_observed variable provided

Type:Series
weights

The event_observed variable provided

Type:Series
variance_matrix_

The variance matrix of the coefficients

Type:DataFrame
standard_errors_

the standard errors of the estimates

Type:Series
score_

the concordance index of the model.

Type:float
AIC_
BIC_
compute_residuals(training_dataframe: pandas.core.frame.DataFrame, kind: str) → pandas.core.frame.DataFrame

Compute the residuals the model.

Parameters:
  • training_dataframe (DataFrame) – the same training DataFrame given in fit
  • kind (string) – One of {‘schoenfeld’, ‘score’, ‘delta_beta’, ‘deviance’, ‘martingale’, ‘scaled_schoenfeld’}

Notes

  • 'scaled_schoenfeld': lifelines does not add the coefficients to the final results, but R does when you call residuals(c, "scaledsch")
concordance_index_

The concordance score (also known as the c-index) of the fit. The c-index is a generalization of the ROC AUC to survival data, including censorships. For this purpose, the concordance_index_ is a measure of the predictive accuracy of the fitted model onto the training dataset.

fit(df, duration_col, event_col=None, ancillary=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None, formula: str = None, fit_options: Optional[dict] = None) → ParametericAFTRegressionFitter

Fit the accelerated failure time model to a right-censored dataset.

Parameters:
  • df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
  • duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes.
  • event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
  • show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
  • formula (string) – Use an R-style formula for modeling the dataset. See formula syntax: https://matthewwardrop.github.io/formulaic/basic/grammar/ If a formula is not provided, all variables in the dataframe are used (minus those used for other purposes like event_col, etc.)
ancillary: None, boolean, str, or DataFrame, optional (default=None)
Choose to model the ancillary parameters. If None or False, explicitly do not fit the ancillary parameters using any covariates. If True, model the ancillary parameters with the same covariates as df. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count as df. If str, should be a formula
fit_intercept: bool, optional
If true, add a constant column to the regression. Overrides value set in class instantiation.
timeline: array, optional
Specify a timeline that will be used for plotting and prediction
weights_col: string
the column in DataFrame that specifies weights per observation.
robust: bool, optional (default=False)
Compute the robust errors using the Huber sandwich estimator.
initial_point: (d,) numpy array, optional
initialize the starting point of the iterative algorithm. Default is the zero vector.
entry_col: string
specify a column in the DataFrame that denotes any late-entries (left truncation) that occurred. See the docs on left truncation
fit_options: dict, optional
pass kwargs into the underlying minimization algorithm, like tol, etc.
Returns:
Return type:self with additional new properties print_summary, params_, confidence_intervals_ and more

Examples

from lifelines import WeibullAFTFitter, LogNormalAFTFitter, LogLogisticAFTFitter

df = pd.DataFrame({
    'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
    'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
    'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
    'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
})

aft = WeibullAFTFitter()
aft.fit(df, 'T', 'E')
aft.print_summary()
aft.predict_median(df)

aft = WeibullAFTFitter()
aft.fit(df, 'T', 'E', ancillary=df)
aft.print_summary()
aft.predict_median(df)
fit_intercept = False
fit_interval_censoring(df, lower_bound_col, upper_bound_col, event_col=None, ancillary=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None, formula=None, fit_options: Optional[dict] = None) → ParametericAFTRegressionFitter

Fit the accelerated failure time model to a interval-censored dataset.

Parameters:
  • df (DataFrame) – a Pandas DataFrame with necessary columns lower_bound_col, upper_bound_col (see below), and any other covariates or weights.
  • lower_bound_col (string) – the name of the column in DataFrame that contains the subjects’ left-most observation.
  • upper_bound_col (string) – the name of the column in DataFrame that contains the subjects’ right-most observation. Values can be np.inf (and should be if the subject is right-censored).
  • event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, will be inferred from the start and stop columns (lower_bound==upper_bound means uncensored)
  • formula (string) – Use an R-style formula for modeling the dataset. See formula syntax: https://matthewwardrop.github.io/formulaic/basic/grammar/ If a formula is not provided, all variables in the dataframe are used (minus those used for other purposes like event_col, etc.)
  • ancillary (None, boolean, str, or DataFrame, optional (default=None)) – Choose to model the ancillary parameters. If None or False, explicitly do not fit the ancillary parameters using any covariates. If True, model the ancillary parameters with the same covariates as df. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count as df. If str, should be a formula
  • fit_intercept (bool, optional) – If true, add a constant column to the regression. Overrides value set in class instantiation.
  • show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
  • timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
  • weights_col (string) – the column in DataFrame that specifies weights per observation.
  • robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
  • initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
  • entry_col (str) – specify a column in the DataFrame that denotes any late-entries (left truncation) that occurred. See the docs on left truncation
  • fit_options (dict, optional) – pass kwargs into the underlying minimization algorithm, like tol, etc.
Returns:

Return type:

self with additional new properties print_summary, params_, confidence_intervals_ and more

Examples

from lifelines import WeibullAFTFitter, LogNormalAFTFitter, LogLogisticAFTFitter

df = pd.DataFrame({
    'start': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
    'stop':  [5, 3, 9, 8, 7, 4, 8, 5, 2, 5, 6, np.inf],  # this last subject is right-censored.
    'E':     [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
    'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
    'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
})

aft = WeibullAFTFitter()
aft.fit_interval_censoring(df, 'start', 'stop', 'E')
aft.print_summary()
aft.predict_median(df)

aft = WeibullAFTFitter()
aft.fit_interval_censoring(df, 'start', 'stop', 'E', ancillary=df)
aft.print_summary()
aft.predict_median(df)
fit_left_censoring(df, duration_col: str = None, event_col: str = None, ancillary=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None, formula: str = None, fit_options: Optional[dict] = None) → ParametericAFTRegressionFitter

Fit the accelerated failure time model to a left-censored dataset.

Parameters:
  • df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
  • duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes/measurements/etc. This column contains the (possibly) left-censored data.
  • event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
  • formula (string) – Use an R-style formula for modeling the dataset. See formula syntax: https://matthewwardrop.github.io/formulaic/basic/grammar/ If a formula is not provided, all variables in the dataframe are used (minus those used for other purposes like event_col, etc.)
  • ancillary (None, boolean, str, or DataFrame, optional (default=None)) – Choose to model the ancillary parameters. If None or False, explicitly do not fit the ancillary parameters using any covariates. If True, model the ancillary parameters with the same covariates as df. If DataFrame, provide covariates to model the ancillary parameters. Must be the same row count as df. If str, should be a formula
  • fit_intercept (bool, optional) – If true, add a constant column to the regression. Overrides value set in class instantiation.
  • show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
  • timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
  • weights_col (string) – the column in DataFrame that specifies weights per observation.
  • robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
  • initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
  • entry_col (str) – specify a column in the DataFrame that denotes any late-entries (left truncation) that occurred. See the docs on left truncation
  • fit_options (dict, optional) – pass kwargs into the underlying minimization algorithm, like tol, etc.
Returns:

self

Return type:

self with additional new properties print_summary, params_, confidence_intervals_ and more

Examples

from lifelines import WeibullAFTFitter, LogNormalAFTFitter, LogLogisticAFTFitter

df = pd.DataFrame({
    'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
    'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
    'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
    'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
})

aft = WeibullAFTFitter()
aft.fit_left_censoring(df, 'T', 'E')
aft.print_summary()
aft.predict_median(df)

aft = WeibullAFTFitter()
aft.fit_left_censoring(df, 'T', 'E', ancillary=df)
aft.print_summary()
aft.predict_median(df)
fit_right_censoring(*args, **kwargs)

Alias for fit

See also

fit()

force_no_intercept = False
label
log_likelihood_ratio_test() → StatisticalResult

This function computes the likelihood ratio test for the model. We compare the existing model (with all the covariates) to the trivial model of no covariates.

mean_survival_time_

The mean survival time of the average subject in the training dataset.

median_survival_time_

The median survival time of the average subject in the training dataset.

plot(columns=None, parameter=None, ax=None, **errorbar_kwargs)

Produces a visual representation of the coefficients, including their standard errors and magnitudes.

Parameters:
  • columns (list, optional) – specify a subset of the columns to plot
  • errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command
Returns:

ax – the matplotlib axis that be edited.

Return type:

matplotlib axis

plot_covariate_groups(*args, **kwargs)

Deprecated as of v0.25.0. Use plot_partial_effects_on_outcome instead.

plot_partial_effects_on_outcome(covariates, values, plot_baseline=True, times=None, y='survival_function', ax=None, **kwargs)

Produces a visual representation comparing the baseline survival curve of the model versus what happens when a covariate(s) is varied over values in a group. This is useful to compare subjects’ survival as we vary covariate(s), all else being held equal. The baseline survival curve is equal to the predicted survival curve at all average values in the original dataset.

Parameters:
  • covariates (string or list) – a string (or list of strings) of the covariate in the original dataset that we wish to vary.
  • values (1d or 2d iterable) – an iterable of the values we wish the covariate to take on.
  • plot_baseline (bool) – also display the baseline survival, defined as the survival at the mean of the original dataset.
  • times (iterable) – pass in a times to plot
  • kwargs – pass in additional plotting commands
Returns:

ax – the matplotlib axis that be edited.

Return type:

matplotlib axis, or list of axis’

Examples

from lifelines import datasets, WeibullAFTFitter
rossi = datasets.load_rossi()
wf = WeibullAFTFitter().fit(rossi, 'week', 'arrest')
wf.plot_partial_effects_on_outcome('prio', values=np.arange(0, 15), cmap='coolwarm')

# multiple variables at once
wf.plot_partial_effects_on_outcome(['prio', 'paro'], values=[[0, 0], [5, 0], [10, 0], [0, 1], [5, 1], [10, 1]], cmap='coolwarm', y="hazard")
predict_cumulative_hazard(df, *, ancillary=None, times=None, conditional_after=None) → pandas.core.frame.DataFrame

Predict the cumulative hazard for the individuals.

Parameters:
  • df (DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
  • ancillary – supply an dataframe to regress ancillary parameters against, if necessary.
  • times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved).
  • conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly non-zero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T | T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
predict_expectation(df, ancillary=None) → pandas.core.series.Series

Predict the expectation of lifetimes, \(E[T | x]\).

Parameters:
  • X (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
  • ancillary_X (DataFrame, optional) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
Returns:

percentiles – the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.

Return type:

DataFrame

See also

predict_median()

predict_hazard(df, *, ancillary=None, times=None, conditional_after=None) → pandas.core.frame.DataFrame

Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.

Parameters:
  • df (DataFrame) – a (n,d) covariate numpy array, Series, or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
  • ancillary – supply an dataframe to regress ancillary parameters against, if necessary.
  • times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved).
  • conditional_after (iterable, optional) – Not implemented yet
predict_median(df, *, ancillary=None, conditional_after=None) → pandas.core.frame.DataFrame

Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.

Parameters:
  • df (DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
  • ancillary – supply an dataframe to regress ancillary parameters against, if necessary.
  • conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly non-zero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T | T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
predict_percentile(df, ancillary=None, p=0.5, conditional_after=None) → pandas.core.series.Series

Returns the median lifetimes for the individuals, by default. If the survival curve of an individual does not cross p, then the result is infinity. http://stats.stackexchange.com/questions/102986/percentile-loss-functions

Parameters:
  • X (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
  • ancillary_X (DataFrame, optional) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
  • p (float, optional (default=0.5)) – the percentile, must be between 0 and 1.
Returns:

percentiles

Return type:

DataFrame

See also

predict_median()

predict_survival_function(df, times=None, conditional_after=None, ancillary=None) → pandas.core.frame.DataFrame

Predict the survival function for individuals, given their covariates. This assumes that the individual just entered the study (that is, we do not condition on how long they have already lived for.)

Parameters:
  • X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
  • ancillary – supply an dataframe to regress ancillary parameters against, if necessary.
  • times (iterable, optional) – an iterable of increasing times to predict the survival function at. Default is the set of all durations (observed and unobserved).
  • conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly non-zero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T | T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
print_summary(decimals: int = 2, style: Optional[str] = None, columns: Optional[list] = None, **kwargs) → None

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters:
  • decimals (int, optional (default=2)) – specify the number of decimal places to show
  • style (string) – {html, ascii, latex}
  • columns – only display a subset of summary columns. Default all.
  • kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
regressors = None
score(df: pandas.core.frame.DataFrame, scoring_method: str = 'log_likelihood') → float

Score the data in df on the fitted model. With default scoring method, returns the _average log-likelihood_.

Parameters:
  • df (DataFrame) – the dataframe with duration col, event col, etc.
  • scoring_method (str) – one of {‘log_likelihood’, ‘concordance_index’} log_likelihood: returns the average unpenalized log-likelihood. concordance_index: returns the concordance-index

Examples

from lifelines import WeibullAFTFitter
from lifelines.datasets import load_rossi

rossi_train = load_rossi().loc[:400]
rossi_test = load_rossi().loc[400:]
wf = WeibullAFTFitter().fit(rossi_train, 'week', 'arrest')

wf.score(rossi_train)
wf.score(rossi_test)
strata = None
summary

Summary statistics describing the fit.

See also

print_summary