class lifelines.fitters.aalen_additive_fitter.AalenAdditiveFitter(fit_intercept=True, alpha=0.05, coef_penalizer=0.0, smoothing_penalizer=0.0)

Bases: lifelines.fitters.RegressionFitter

This class fits the regression model:

$h(t|x) = b_0(t) + b_1(t) x_1 + ... + b_N(t) x_N$

that is, the hazard rate is a linear function of the covariates with time-varying coefficients. This implementation assumes non-time-varying covariates, see TODO: name

Note

This class was rewritten in lifelines 0.17.0 to focus solely on static datasets. There is no guarantee of backwards compatibility.

Parameters: fit_intercept (bool, optional (default: True)) – If False, do not attach an intercept (column of ones) to the covariate matrix. The intercept, $$b_0(t)$$ acts as a baseline hazard. alpha (float, optional (default=0.05)) – the level in the confidence intervals. coef_penalizer (float, optional (default: 0)) – Attach a L2 penalizer to the size of the coefficients during regression. This improves stability of the estimates and controls for high correlation between covariates. For example, this shrinks the magnitude of $$c_{i,t}$$. smoothing_penalizer (float, optional (default: 0)) – Attach a L2 penalizer to difference between adjacent (over time) coefficients. For example, this shrinks the magnitude of $$c_{i,t} - c_{i,t+1}$$.
cumulative_hazards_

The estimated cumulative hazard

Type: DataFrame
hazards_

The estimated hazards

Type: DataFrame
confidence_intervals_

The lower and upper confidence intervals for the cumulative hazard

Type: DataFrame
durations

The durations provided

Type: array
event_observed

The event_observed variable provided

Type: array
weights

The event_observed variable provided

Type: array
compute_residuals(training_dataframe: pandas.core.frame.DataFrame, kind: str) → pandas.core.frame.DataFrame

Compute the residuals the model.

Parameters: training_dataframe (DataFrame) – the same training DataFrame given in fit kind (string) – One of {‘schoenfeld’, ‘score’, ‘delta_beta’, ‘deviance’, ‘martingale’, ‘scaled_schoenfeld’}

Notes

• 'scaled_schoenfeld': lifelines does not add the coefficients to the final results, but R does when you call residuals(c, "scaledsch")
concordance_index_

The concordance score (also known as the c-index) of the fit. The c-index is a generalization of the ROC AUC to survival data, including censorships.

For this purpose, the score_ is a measure of the predictive accuracy of the fitted model onto the training dataset. It’s analogous to the R^2 in linear models.

fit(df, duration_col, event_col=None, weights_col=None, show_progress=False, formula: str = None)
Parameters: Fit the Aalen Additive model to a dataset. df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored). duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes. event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored. weights_col (string, optional) – an optional column in the DataFrame, df, that denotes the weight per subject. This column is expelled and not used as a covariate, but as a weight in the final regression. Default weight is 1. This can be used for case-weights. For example, a weight of 2 means there were two subjects with identical observations. This can be used for sampling weights. show_progress (bool, optional (default=False)) – Since the fitter is iterative, show iteration number. formula (str) – an R-like formula self – self with additional new properties: cumulative_hazards_, etc. AalenAdditiveFitter

Examples

from lifelines import AalenAdditiveFitter

df = pd.DataFrame({
'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
})

aaf.fit(df, 'T', 'E')
aaf.predict_median(df)
aaf.print_summary()

fit_right_censoring(*args, **kwargs)

Alias for fit

plot(columns=None, loc=None, iloc=None, ax=None, **kwargs)

” A wrapper around plotting. Matplotlib plot arguments can be passed in, plus:

Parameters: columns (string or list-like, optional) – If not empty, plot a subset of columns from the cumulative_hazards_. Default all. loc iloc (slice, optional) – specify a location-based subsection of the curves to plot, ex: .plot(iloc=slice(0,10)) will plot the first 10 time points.
plot_covariate_groups(*args, **kwargs)

Deprecated as of v0.25.0. Use plot_partial_effects_on_outcome instead.

predict_cumulative_hazard(X)

Returns the hazard rates for the individuals

Parameters: X (a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns) – can be in any order. If a numpy array, columns must be in the same order as the training data.
predict_expectation(X) → pandas.core.series.Series

Compute the expected lifetime, E[T], using covariates X.

Parameters: X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. Returns the expected lifetimes for the individuals
predict_median(X) → pandas.core.series.Series
Parameters: X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. Returns the median lifetimes for the individuals
predict_percentile(X, p=0.5) → pandas.core.series.Series

Returns the median lifetimes for the individuals. http://stats.stackexchange.com/questions/102986/percentile-loss-functions

Parameters: X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. p (float) – default: 0.5
predict_survival_function(X, times=None)

Returns the survival functions for the individuals

Parameters: X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. times – Not implemented yet
print_summary(decimals=2, style=None, columns=None, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters: decimals (int, optional (default=2)) – specify the number of decimal places to show style (string) – {html, ascii, latex} columns – only display a subset of summary columns. Default all. kwargs – print additional meta data in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
score(df: pandas.core.frame.DataFrame, scoring_method: str = 'log_likelihood') → float

Score the data in df on the fitted model. With default scoring method, returns the average partial log-likelihood.

Parameters: df (DataFrame) – the dataframe with duration col, event col, etc. scoring_method (str) – one of {‘log_likelihood’, ‘concordance_index’} log_likelihood: returns the average unpenalized partial log-likelihood. concordance_index: returns the concordance-index
smoothed_hazards_(bandwidth=1)

Using the epanechnikov kernel to smooth the hazard function, with sigma/bandwidth

summary

Summary statistics describing the fit.

Returns: df DataFrame