AalenAdditiveFitter¶
-
class
lifelines.fitters.aalen_additive_fitter.
AalenAdditiveFitter
(fit_intercept=True, alpha=0.05, coef_penalizer=0.0, smoothing_penalizer=0.0)¶ Bases:
lifelines.fitters.RegressionFitter
This class fits the regression model:
\[h(t|x) = b_0(t) + b_1(t) x_1 + ... + b_N(t) x_N\]that is, the hazard rate is a linear function of the covariates with time-varying coefficients. This implementation assumes non-time-varying covariates, see
TODO: name
Note
This class was rewritten in lifelines 0.17.0 to focus solely on static datasets. There is no guarantee of backwards compatibility.
Parameters: - fit_intercept (bool, optional (default: True)) – If False, do not attach an intercept (column of ones) to the covariate matrix. The intercept, \(b_0(t)\) acts as a baseline hazard.
- alpha (float, optional (default=0.05)) – the level in the confidence intervals.
- coef_penalizer (float, optional (default: 0)) – Attach a L2 penalizer to the size of the coefficients during regression. This improves stability of the estimates and controls for high correlation between covariates. For example, this shrinks the magnitude of \(c_{i,t}\).
- smoothing_penalizer (float, optional (default: 0)) – Attach a L2 penalizer to difference between adjacent (over time) coefficients. For example, this shrinks the magnitude of \(c_{i,t} - c_{i,t+1}\).
-
cumulative_hazards_
¶ The estimated cumulative hazard
Type: DataFrame
-
hazards_
¶ The estimated hazards
Type: DataFrame
-
confidence_intervals_
¶ The lower and upper confidence intervals for the cumulative hazard
Type: DataFrame
-
durations
¶ The durations provided
Type: array
-
event_observed
¶ The event_observed variable provided
Type: array
-
weights
¶ The event_observed variable provided
Type: array
-
compute_residuals
(training_dataframe: pandas.core.frame.DataFrame, kind: str) → pandas.core.frame.DataFrame¶ Compute the residuals the model.
Parameters: - training_dataframe (DataFrame) – the same training DataFrame given in fit
- kind (string) – One of {‘schoenfeld’, ‘score’, ‘delta_beta’, ‘deviance’, ‘martingale’, ‘scaled_schoenfeld’}
Notes
'scaled_schoenfeld'
: lifelines does not add the coefficients to the final results, but R does when you callresiduals(c, "scaledsch")
-
concordance_index_
¶ The concordance score (also known as the c-index) of the fit. The c-index is a generalization of the ROC AUC to survival data, including censorships.
For this purpose, the
score_
is a measure of the predictive accuracy of the fitted model onto the training dataset. It’s analogous to the R^2 in linear models.
-
fit
(df, duration_col, event_col=None, weights_col=None, show_progress=False, formula: str = None)¶ Parameters: Fit the Aalen Additive model to a dataset.
Parameters: - df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored).
- duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes.
- event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored.
- weights_col (string, optional) – an optional column in the DataFrame, df, that denotes the weight per subject. This column is expelled and not used as a covariate, but as a weight in the final regression. Default weight is 1. This can be used for case-weights. For example, a weight of 2 means there were two subjects with identical observations. This can be used for sampling weights.
- show_progress (bool, optional (default=False)) – Since the fitter is iterative, show iteration number.
- formula (str) – an R-like formula
Returns: self – self with additional new properties:
cumulative_hazards_
, etc.Return type: Examples
from lifelines import AalenAdditiveFitter df = pd.DataFrame({ 'T': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], 'E': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], 'var': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2], 'age': [4, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], }) aaf = AalenAdditiveFitter() aaf.fit(df, 'T', 'E') aaf.predict_median(df) aaf.print_summary()
-
label
¶
-
plot
(columns=None, loc=None, iloc=None, ax=None, **kwargs)¶ ” A wrapper around plotting. Matplotlib plot arguments can be passed in, plus:
Parameters: columns (string or list-like, optional) – If not empty, plot a subset of columns from the
cumulative_hazards_
. Default all.loc
iloc (slice, optional) –
- specify a location-based subsection of the curves to plot, ex:
.plot(iloc=slice(0,10))
will plot the first 10 time points.
-
plot_covariate_groups
(*args, **kwargs)¶ Deprecated as of v0.25.0. Use
plot_partial_effects_on_outcome
instead.
-
predict_cumulative_hazard
(X)¶ Returns the hazard rates for the individuals
Parameters: X (a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns) – can be in any order. If a numpy array, columns must be in the same order as the training data.
-
predict_expectation
(X) → pandas.core.series.Series¶ Compute the expected lifetime, E[T], using covariates X.
Parameters: - X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
- Returns the expected lifetimes for the individuals
-
predict_median
(X) → pandas.core.series.Series¶ Parameters: - X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
- Returns the median lifetimes for the individuals
-
predict_percentile
(X, p=0.5) → pandas.core.series.Series¶ Returns the median lifetimes for the individuals. http://stats.stackexchange.com/questions/102986/percentile-loss-functions
Parameters: - X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
- p (float) – default: 0.5
-
predict_survival_function
(X, times=None)¶ Returns the survival functions for the individuals
Parameters: - X (a (n,d) covariate numpy array or DataFrame) – If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data.
- times – Not implemented yet
-
print_summary
(decimals=2, style=None, columns=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters: - decimals (int, optional (default=2)) – specify the number of decimal places to show
- style (string) – {html, ascii, latex}
- columns – only display a subset of
summary
columns. Default all. - kwargs – print additional meta data in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
-
score
(df: pandas.core.frame.DataFrame, scoring_method: str = 'log_likelihood') → float¶ Score the data in df on the fitted model. With default scoring method, returns the average partial log-likelihood.
Parameters: - df (DataFrame) – the dataframe with duration col, event col, etc.
- scoring_method (str) – one of {‘log_likelihood’, ‘concordance_index’} log_likelihood: returns the average unpenalized partial log-likelihood. concordance_index: returns the concordance-index
-
smoothed_hazards_
(bandwidth=1)¶ Using the epanechnikov kernel to smooth the hazard function, with sigma/bandwidth
-
summary
¶ Summary statistics describing the fit.
Returns: df Return type: DataFrame