AalenAdditiveFitter¶
- class lifelines.fitters.aalen_additive_fitter.AalenAdditiveFitter(fit_intercept=True, alpha=0.05, coef_penalizer=0.0, smoothing_penalizer=0.0)¶
This class fits the regression model:
\[h(t|x) = b_0(t) + b_1(t) x_1 + ... + b_N(t) x_N\]that is, the hazard rate is a linear function of the covariates with time-varying coefficients. This implementation assumes non-time-varying covariates, see
TODO: name
Note
This class was rewritten in lifelines 0.17.0 to focus solely on static datasets. There is no guarantee of backwards compatibility.
- Parameters:
fit_intercept (bool, optional (default: True)) – If False, do not attach an intercept (column of ones) to the covariate matrix. The intercept, \(b_0(t)\) acts as a baseline hazard.
alpha (float, optional (default=0.05)) – the level in the confidence intervals.
coef_penalizer (float, optional (default: 0)) – Attach a L2 penalizer to the size of the coefficients during regression. This improves stability of the estimates and controls for high correlation between covariates. For example, this shrinks the magnitude of \(c_{i,t}\).
smoothing_penalizer (float, optional (default: 0)) – Attach a L2 penalizer to difference between adjacent (over time) coefficients. For example, this shrinks the magnitude of \(c_{i,t} - c_{i,t+1}\).
- cumulative_hazards_¶
The estimated cumulative hazard
- Type:
DataFrame
- hazards_¶
The estimated hazards
- Type:
DataFrame
- confidence_intervals_¶
The lower and upper confidence intervals for the cumulative hazard
- Type:
DataFrame
- durations¶
The durations provided
- Type:
array
- event_observed¶
The event_observed variable provided
- Type:
array
- weights¶
The event_observed variable provided
- Type:
array
- property concordance_index_¶
The concordance score (also known as the c-index) of the fit. The c-index is a generalization of the ROC AUC to survival data, including censorships.
For this purpose, the
score_
is a measure of the predictive accuracy of the fitted model onto the training dataset. It’s analogous to the R^2 in linear models.
- fit(df, duration_col, event_col=None, weights_col=None, show_progress=False, formula: str = None)¶
- Parameters:
Fit the Aalen Additive model to a dataset.
- Parameters:
df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes.
event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
weights_col (string, optional) – an optional column in the DataFrame, df, that denotes the weight per subject. This column is expelled and not used as a covariate, but as a weight in the final regression. Default weight is 1. This can be used for case-weights. For example, a weight of 2 means there were two subjects with identical observations. This can be used for sampling weights.
show_progress (bool, optional (default=False)) – Since the fitter is iterative, show iteration number.
formula (str) – an R-like formula
- Returns:
self – self with additional new properties:
cumulative_hazards_
, etc.- Return type:
Examples
from lifelines import AalenAdditiveFitter df = pd.DataFrame({ 'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], }) aaf = AalenAdditiveFitter() aaf.fit(df, 'T', 'E') aaf.predict_median(df) aaf.print_summary()
- plot(columns=None, loc=None, iloc=None, ax=None, **kwargs)¶
” A wrapper around plotting. Matplotlib plot arguments can be passed in, plus:
- Parameters:
columns (string or list-like, optional) – If not empty, plot a subset of columns from the
cumulative_hazards_
. Default all.loc
iloc (slice, optional) –
- specify a location-based subsection of the curves to plot, ex:
.plot(iloc=slice(0,10))
will plot the first 10 time points.
- predict_cumulative_hazard(X)¶
Returns the hazard rates for the individuals
- Parameters:
X (a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns) – can be in any order. If a numpy array, columns must be in the same order as the training data.
- predict_expectation(X) Series ¶
Compute the expected lifetime, E[T], using covariates X.
- Parameters:
X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
Returns the expected lifetimes for the individuals
- predict_median(X) Series ¶
- Parameters:
X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
Returns the median lifetimes for the individuals
- predict_percentile(X, p=0.5) Series ¶
Returns the median lifetimes for the individuals. http://stats.stackexchange.com/questions/102986/percentile-loss-functions
- Parameters:
X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
p (float) – default: 0.5
- predict_survival_function(X, times=None)¶
Returns the survival functions for the individuals
- Parameters:
X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
times – Not implemented yet
- print_summary(decimals=2, style=None, columns=None, **kwargs)¶
Print summary statistics describing the fit, the coefficients, and the error bounds.
- Parameters:
decimals (int, optional (default=2)) – specify the number of decimal places to show
style (string) – {html, ascii, latex}
columns – only display a subset of
summary
columns. Default all.kwargs – print additional meta data in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
- score(df: DataFrame, scoring_method: str = 'log_likelihood') float ¶
Score the data in df on the fitted model. With default scoring method, returns the average partial log-likelihood.
- Parameters:
df (DataFrame) – the dataframe with duration col, event col, etc.
scoring_method (str) – one of {‘log_likelihood’, ‘concordance_index’} log_likelihood: returns the average unpenalized partial log-likelihood. concordance_index: returns the concordance-index
- smoothed_hazards_(bandwidth=1)¶
Using the epanechnikov kernel to smooth the hazard function, with sigma/bandwidth
- property summary¶
Summary statistics describing the fit.
- Returns:
df
- Return type:
DataFrame