GeneralizedGammaRegressionFitter¶

class
lifelines.fitters.generalized_gamma_regression_fitter.
GeneralizedGammaRegressionFitter
(alpha=0.05, penalizer=0.0, l1_ratio=0.0)¶ Bases:
lifelines.fitters.ParametricRegressionFitter
This class implements a Generalized Gamma model for regression data. The model has parameterized form:
The survival function is:
\[\begin{split}S(t; x)=\left\{ \begin{array}{} 1\Gamma_{RL}\left( \frac{1}{{{\lambda }^{2}}};\frac{{e}^{\lambda \left( \frac{\log(t)\mu }{\sigma} \right)}}{\lambda ^{2}} \right) \textit{ if } \lambda> 0 \\ \Gamma_{RL}\left( \frac{1}{{{\lambda }^{2}}};\frac{{e}^{\lambda \left( \frac{\log(t)\mu }{\sigma} \right)}}{\lambda ^{2}} \right) \textit{ if } \lambda \le 0 \\ \end{array} \right.\,\!\end{split}\]where \(\Gamma_{RL}\) is the regularized lower incomplete Gamma function, and \(\sigma = \sigma(x) = \exp(\alpha x^T), \lambda = \lambda(x) = \beta x^T, \mu = \mu(x) = \gamma x^T\).
This model has the Exponential, Weibull, Gamma and LogNormal as submodels, and thus can be used as a way to test which model to use:
 When \(\lambda = 1\) and \(\sigma = 1\), then the data is Exponential.
 When \(\lambda = 1\) then the data is Weibull.
 When \(\sigma = \lambda\) then the data is Gamma.
 When \(\lambda = 0\) then the data is LogNormal.
 When \(\lambda = 1\) then the data is InverseWeibull.
 When \(\sigma = \lambda\) then the data is InverseGamma.
After calling the
.fit
method, you have access to properties like:cumulative_hazard_
,survival_function_
, A summary of the fit is available with the methodprint_summary()
.Important
The parameterization implemented has \(\log\sigma\), thus there is a ln_sigma_ in the output. Exponentiate this parameter to recover \(\sigma\).
Important
This model is experimental. It’s API may change in the future. Also, it’s convergence is not very stable.
Parameters: alpha (float, optional (default=0.05)) – the level in the confidence intervals. Examples
from lifelines import GeneralizedGammaFitter from lifelines.datasets import load_waltons waltons = load_waltons() ggf = GeneralizedGammaFitter() ggf.fit(waltons['T'], waltons['E']) ggf.plot() ggf.summary

cumulative_hazard_
¶ The estimated cumulative hazard (with custom timeline if provided)
Type: DataFrame

hazard_
¶ The estimated hazard (with custom timeline if provided)
Type: DataFrame

survival_function_
¶ The estimated survival function (with custom timeline if provided)
Type: DataFrame

cumulative_density_
¶ The estimated cumulative density function (with custom timeline if provided)
Type: DataFrame

density
¶ The estimated density function (PDF) (with custom timeline if provided)
Type: DataFrame

variance_matrix_
¶ The variance matrix of the coefficients
Type: numpy array

median_
¶ The median time to event
Type: float

lambda_
¶ The fitted parameter in the model
Type: float

rho_
¶ The fitted parameter in the model
Type: float

alpha_
¶ The fitted parameter in the model
Type: float

durations
¶ The durations provided
Type: array

event_observed
¶ The event_observed variable provided
Type: array

timeline
¶ The time line to use for plotting and indexing
Type: array

entry
¶ The entry array provided, or None
Type: array or None

compute_residuals
(training_dataframe: pandas.core.frame.DataFrame, kind: str) → pandas.core.frame.DataFrame¶ Compute the residuals the model.
Parameters:  training_dataframe (DataFrame) – the same training DataFrame given in fit
 kind (string) – {‘schoenfeld’, ‘score’, ‘delta_beta’, ‘deviance’, ‘martingale’, ‘scaled_schoenfeld’}

concordance_index_
¶ The concordance score (also known as the cindex) of the fit. The cindex is a generalization of the ROC AUC to survival data, including censorships. For this purpose, the
concordance_index_
is a measure of the predictive accuracy of the fitted model onto the training dataset.

fit
(df, duration_col, event_col=None, regressors=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → ParametricRegressionFitter¶ Fit the regression model to a rightcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 regressors (dict, optional) – a dictionary of parameter names > list of column names that maps model parameters to a linear combination of variables. If left as None, all variables will be used for all parameters.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (string) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: self with additional new properties
Return type: print_summary
,params_
,confidence_intervals_
and more

fit_interval_censoring
(df, lower_bound_col, upper_bound_col, event_col=None, ancillary_df=None, regressors=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → ParametricRegressionFitter¶ Fit the regression model to a rightcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 lower_bound_col (string) – the name of the column in DataFrame that contains the lower bounds of the intervals.
 upper_bound_col (string) – the name of the column in DataFrame that contains the upper bounds of the intervals.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, this is inferred based on the upper and lower interval limits (equal implies observed death.)
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 regressors (dict, optional) – a dictionary of parameter names > list of column names that maps model parameters to a linear combination of variables. If left as None, all variables will be used for all parameters.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (string) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: self with additional new properties
Return type: print_summary
,params_
,confidence_intervals_
and more

fit_left_censoring
(df, duration_col=None, event_col=None, regressors=None, fit_intercept=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None) → ParametricRegressionFitter¶ Fit the accelerated failure time model to a leftcensored dataset.
Parameters:  df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
 duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes/measurements/etc. This column contains the (possibly) leftcensored data.
 event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
 fit_intercept (bool, optional) – If true, add a constant column to the regression. Overrides value set in class instantiation.
 show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing.
 regressors (dict, optional) – a dictionary of parameter names > list of column names that maps model parameters to a linear combination of variables. If left as None, all variables will be used for all parameters.
 timeline (array, optional) – Specify a timeline that will be used for plotting and prediction
 weights_col (string) – the column in DataFrame that specifies weights per observation.
 robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator.
 initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector.
 entry_col (str) – specify a column in the DataFrame that denotes any lateentries (left truncation) that occurred. See the docs on left truncation
Returns: Return type: self with additional new properties
print_summary
,params_
,confidence_intervals_
and more

fit_right_censoring
(*args, **kwargs)¶ Alias for
fit
See also
fit

log_likelihood_ratio_test
()¶ This function computes the likelihood ratio test for the model. We compare the existing model (with all the covariates) to the trivial model of no covariates.

mean_survival_time_
¶ The mean survival time of the average subject in the training dataset.

median_survival_time_
¶ The median survival time of the average subject in the training dataset.

plot
(columns=None, parameter=None, ax=None, **errorbar_kwargs)¶ Produces a visual representation of the coefficients, including their standard errors and magnitudes.
Parameters:  columns (list, optional) – specify a subset of the columns to plot
 errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis

plot_covariate_groups
(covariates, values, plot_baseline=True, ax=None, times=None, **kwargs)¶ Produces a plot comparing the baseline survival curve of the model versus what happens when a covariate(s) is varied over values in a group. This is useful to compare subjects’ survival as we vary covariate(s), all else being held equal. The baseline survival curve is equal to the predicted survival curve at all average values in the original dataset.
Parameters:  covariates (string or list) – a string (or list of strings) of the covariate in the original dataset that we wish to vary.
 values (1d or 2d iterable) – an iterable of the values we wish the covariate to take on.
 plot_baseline (bool) – also display the baseline survival, defined as the survival at the mean of the original dataset.
 times – pass in a times to plot
 kwargs – pass in additional plotting commands
Returns: ax – the matplotlib axis that be edited.
Return type: matplotlib axis, or list of axis’
Examples
from lifelines import datasets, WeibullAFTFitter rossi = datasets.load_rossi() wf = WeibullAFTFitter().fit(rossi, 'week', 'arrest') wf.plot_covariate_groups('prio', values=np.arange(0, 15, 3), cmap='coolwarm')
# multiple variables at once wf.plot_covariate_groups(['prio', 'paro'], values=[[0, 0], [5, 0], [10, 0], [0, 1], [5, 1], [10, 1]], cmap='coolwarm') # if you have categorical variables, you can simply things: wf.plot_covariate_groups(['dummy1', 'dummy2', 'dummy3'], values=np.eye(3))

predict_cumulative_hazard
(df, *, times=None, conditional_after=None)¶ Predict the cumulative hazard for individuals, given their covariates.
Parameters:  df (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order.
 times (iterable, optional) – an iterable (array, list, series) of increasing times to predict the cumulative hazard at. Default is the set of all durations in the training dataset (observed and unobserved).
 conditional_after (iterable, optional) – Must be equal is size to (df.shape[0],) (n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
Returns: the cumulative hazards of individuals over the timeline
Return type: DataFrame

predict_expectation
(X, conditional_after=None) → pandas.core.series.Series¶ Compute the expected lifetime, \(E[T]\), using covariates X. This algorithm to compute the expectation is to use the fact that \(E[T] = \int_0^\inf P(T > t) dt = \int_0^\inf S(t) dt\). To compute the integral, we use the trapizoidal rule to approximate the integral.
Caution
If the survival function doesn’t converge to 0, the the expectation is really infinity and the returned values are meaningless/too large. In that case, using
predict_median
orpredict_percentile
would be better.Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. Returns: expectations Return type: DataFrame Notes
If X is a DataFrame, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.
See also

predict_hazard
(df, *, times=None)¶ Predict the hazard for individuals, given their covariates.
Parameters:  df (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order.
 times (iterable, optional) – an iterable (array, list, series) of increasing times to predict the cumulative hazard at. Default is the set of all durations in the training dataset (observed and unobserved).
 conditional_after – Not implemented yet.
Returns: the hazards of individuals over the timeline
Return type: DataFrame

predict_median
(df, *, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Parameters:  X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order.
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0.
Returns: percentiles – the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.
Return type: DataFrame
See also

predict_percentile
(df, *, p=0.5, conditional_after=None) → pandas.core.series.Series¶

predict_survival_function
(df, times=None, conditional_after=None) → pandas.core.frame.DataFrame¶ Predict the survival function for individuals, given their covariates. This assumes that the individual just entered the study (that is, we do not condition on how long they have already lived for.)
Parameters:  df (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order.
 times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index.
 conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly nonzero values that represent how long the subject has already lived for. Ex: if \(T\) is the unknown event time, then this represents \(T  T > s\). This is useful for knowing the remaining hazard/survival of censored subjects.
Returns: survival_function – the survival probabilities of individuals over the timeline
Return type: DataFrame

print_summary
(decimals=2, style=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 style (string) – {html, ascii, latex}
 kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

score
(df: pandas.core.frame.DataFrame, scoring_method: str = 'log_likelihood') → float¶ Score the data in df on the fitted model. With default scoring method, returns the _average loglikelihood_.
Parameters:  df (DataFrame) – the dataframe with duration col, event col, etc.
 scoring_method (str) – one of {‘log_likelihood’, ‘concordance_index’} log_likelihood: returns the average unpenalized loglikelihood. concordance_index: returns the concordanceindex
Examples
from lifelines import WeibullAFTFitter from lifelines.datasets import load_rossi rossi_train = load_rossi().loc[:400] rossi_test = load_rossi().loc[400:] wf = WeibullAFTFitter().fit(rossi_train, 'week', 'arrest') wf.score(rossi_train) wf.score(rossi_test)

summary
¶ Summary statistics describing the fit.
See also
print_summary