AalenAdditiveFitter¶

class lifelines.fitters.aalen_additive_fitter.AalenAdditiveFitter(fit_intercept=True, alpha=0.05, coef_penalizer=0.0, smoothing_penalizer=0.0)¶

This class fits the regression model:

\[h(t|x) = b_0(t) + b_1(t) x_1 + ... + b_N(t) x_N\]

that is, the hazard rate is a linear function of the covariates with time-varying coefficients. This implementation assumes non-time-varying covariates, see TODO: name

Note

This class was rewritten in lifelines 0.17.0 to focus solely on static datasets. There is no guarantee of backwards compatibility.

Parameters:

fit_intercept (bool, optional (default: True)) – If False, do not attach an intercept (column of ones) to the covariate matrix. The intercept, \(b_0(t)\) acts as a baseline hazard.
alpha (float, optional (default=0.05)) – the level in the confidence intervals.
coef_penalizer (float, optional (default: 0)) – Attach a L2 penalizer to the size of the coefficients during regression. This improves stability of the estimates and controls for high correlation between covariates. For example, this shrinks the magnitude of \(c_{i,t}\).
smoothing_penalizer (float, optional (default: 0)) – Attach a L2 penalizer to difference between adjacent (over time) coefficients. For example, this shrinks the magnitude of \(c_{i,t} - c_{i,t+1}\).

cumulative_hazards_¶

The estimated cumulative hazard

Type:: DataFrame

hazards_¶

The estimated hazards

Type:: DataFrame

confidence_intervals_¶

The lower and upper confidence intervals for the cumulative hazard

Type:: DataFrame

durations¶

The durations provided

Type:: array

event_observed¶

The event_observed variable provided

Type:: array

weights¶

The event_observed variable provided

Type:: array

property concordance_index_¶

The concordance score (also known as the c-index) of the fit. The c-index is a generalization of the ROC AUC to survival data, including censorships.

For this purpose, the score_ is a measure of the predictive accuracy of the fitted model onto the training dataset. It’s analogous to the R^2 in linear models.

fit(df, duration_col, event_col=None, weights_col=None, show_progress=False, formula: str = None)¶

Parameters:

Fit the Aalen Additive model to a dataset.

Parameters:

df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes.
event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
weights_col (string, optional) – an optional column in the DataFrame, df, that denotes the weight per subject. This column is expelled and not used as a covariate, but as a weight in the final regression. Default weight is 1. This can be used for case-weights. For example, a weight of 2 means there were two subjects with identical observations. This can be used for sampling weights.
show_progress (bool, optional (default=False)) – Since the fitter is iterative, show iteration number.
formula (str) – an R-like formula

Returns:

self – self with additional new properties: cumulative_hazards_, etc.

Return type:

AalenAdditiveFitter

Examples

from lifelines import AalenAdditiveFitter

df = pd.DataFrame({
    'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
    'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
    'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2],
    'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
})

aaf = AalenAdditiveFitter()
aaf.fit(df, 'T', 'E')
aaf.predict_median(df)
aaf.print_summary()

plot(columns=None, loc=None, iloc=None, ax=None, **kwargs)¶

” A wrapper around plotting. Matplotlib plot arguments can be passed in, plus:

Parameters:

columns (string or list-like, optional) – If not empty, plot a subset of columns from the cumulative_hazards_. Default all.
loc
iloc (slice, optional) –

specify a location-based subsection of the curves to plot, ex:
.plot(iloc=slice(0,10)) will plot the first 10 time points.

predict_cumulative_hazard(X)¶

Returns the hazard rates for the individuals

Parameters:: X (a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns) – can be in any order. If a numpy array, columns must be in the same order as the training data.

predict_expectation(X) → Series¶

Compute the expected lifetime, E[T], using covariates X.

Parameters:

X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
Returns the expected lifetimes for the individuals

predict_median(X) → Series¶

Parameters:

X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
Returns the median lifetimes for the individuals

predict_percentile(X, p=0.5) → Series¶

Returns the median lifetimes for the individuals. http://stats.stackexchange.com/questions/102986/percentile-loss-functions

Parameters:

X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
p (float) – default: 0.5

predict_survival_function(X, times=None)¶

Returns the survival functions for the individuals

Parameters:

X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
times – Not implemented yet

print_summary(decimals=2, style=None, columns=None, **kwargs)¶

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters:

decimals (int, optional (default=2)) – specify the number of decimal places to show
style (string) – {html, ascii, latex}
columns – only display a subset of summary columns. Default all.
kwargs – print additional meta data in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

score(df: DataFrame, scoring_method: str = 'log_likelihood') → float¶

Score the data in df on the fitted model. With default scoring method, returns the average partial log-likelihood.

Parameters:

df (DataFrame) – the dataframe with duration col, event col, etc.
scoring_method (str) – one of {‘log_likelihood’, ‘concordance_index’} log_likelihood: returns the average unpenalized partial log-likelihood. concordance_index: returns the concordance-index

smoothed_hazards_(bandwidth=1)¶: Using the epanechnikov kernel to smooth the hazard function, with sigma/bandwidth

property summary¶

Summary statistics describing the fit.

Returns:: df
Return type:: DataFrame