# PiecewiseExponentialRegressionFitter¶

class lifelines.fitters.piecewise_exponential_regression_fitter.PiecewiseExponentialRegressionFitter(breakpoints, alpha=0.05, penalizer=0.0)

Bases: lifelines.fitters.ParametricRegressionFitter

This implements a piecewise constant-hazard model at pre-specified break points.

$\begin{split}h(t) = \begin{cases} 1/\lambda_0(x) & \text{if t \le \tau_0} \\ 1/\lambda_1(x) & \text{if \tau_0 < t \le \tau_1} \\ 1/\lambda_2(x) & \text{if \tau_1 < t \le \tau_2} \\ ... \end{cases}\end{split}$

where $$\lambda_i(x) = \exp{\beta_i x}$$.

Parameters: breakpoints (list) – a list of times when a new exponential model is constructed. penalizer (float) – penalize the variance of the $$\lambda_i$$. See blog post below. alpha (float, optional (default=0.05)) – the level in the confidence intervals.

Examples

See blog post here and paper replication here

AIC_
BIC_
compute_residuals(training_dataframe: pandas.core.frame.DataFrame, kind: str) → pandas.core.frame.DataFrame

Compute the residuals the model.

Parameters: training_dataframe (DataFrame) – the same training DataFrame given in fit kind (string) – One of {‘schoenfeld’, ‘score’, ‘delta_beta’, ‘deviance’, ‘martingale’, ‘scaled_schoenfeld’}

Notes

• 'scaled_schoenfeld': lifelines does not add the coefficients to the final results, but R does when you call residuals(c, "scaledsch")
concordance_index_

The concordance score (also known as the c-index) of the fit. The c-index is a generalization of the ROC AUC to survival data, including censorships. For this purpose, the concordance_index_ is a measure of the predictive accuracy of the fitted model onto the training dataset.

fit(df, duration_col, event_col=None, regressors=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None, fit_options: Optional[dict] = None) → ParametricRegressionFitter

Fit the regression model to a right-censored dataset.

Parameters: df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored). duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes. event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored. show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing. regressors (dict, optional) – a dictionary of parameter names -> {list of column names, formula} that maps model parameters to a linear combination of variables. If left as None, all variables will be used for all parameters. timeline (array, optional) – Specify a timeline that will be used for plotting and prediction weights_col (string) – the column in DataFrame that specifies weights per observation. robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator. initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector. entry_col (string) – specify a column in the DataFrame that denotes any late-entries (left truncation) that occurred. See the docs on left truncation fit_options (dict, optional) – pass kwargs into the underlying minimization algorithm, like tol, etc. self with additional new properties print_summary, params_, confidence_intervals_ and more
fit_intercept = True
fit_interval_censoring(df, lower_bound_col, upper_bound_col, event_col=None, ancillary=None, regressors=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None, fit_options: Optional[dict] = None) → ParametricRegressionFitter

Fit the regression model to a interval-censored dataset.

Parameters: df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored). lower_bound_col (string) – the name of the column in DataFrame that contains the lower bounds of the intervals. upper_bound_col (string) – the name of the column in DataFrame that contains the upper bounds of the intervals. event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, this is inferred based on the upper and lower interval limits (equal implies observed death.) show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing. regressors (dict, optional) – a dictionary of parameter names -> {list of column names, formula} that maps model parameters to a linear combination of variables. If left as None, all variables will be used for all parameters. timeline (array, optional) – Specify a timeline that will be used for plotting and prediction weights_col (string) – the column in DataFrame that specifies weights per observation. robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator. initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector. entry_col (string) – specify a column in the DataFrame that denotes any late-entries (left truncation) that occurred. See the docs on left truncation fit_options (dict, optional) – pass kwargs into the underlying minimization algorithm, like tol, etc. self with additional new properties print_summary, params_, confidence_intervals_ and more
fit_left_censoring(df, duration_col=None, event_col=None, regressors=None, show_progress=False, timeline=None, weights_col=None, robust=False, initial_point=None, entry_col=None, fit_options: Optional[dict] = None) → ParametricRegressionFitter

Fit the regression model to a left-censored dataset.

Parameters: df (DataFrame) – a Pandas DataFrame with necessary columns duration_col and event_col (see below), covariates columns, and special columns (weights). duration_col refers to the lifetimes of the subjects. event_col refers to whether the ‘death’ events was observed: 1 if observed, 0 else (censored). duration_col (string) – the name of the column in DataFrame that contains the subjects’ lifetimes/measurements/etc. This column contains the (possibly) left-censored data. event_col (string, optional) – the name of the column in DataFrame that contains the subjects’ death observation. If left as None, assume all individuals are uncensored. show_progress (bool, optional (default=False)) – since the fitter is iterative, show convergence diagnostics. Useful if convergence is failing. regressors (dict, optional) – a dictionary of parameter names -> {list of column names, formula} that maps model parameters to a linear combination of variables. If left as None, all variables will be used for all parameters. timeline (array, optional) – Specify a timeline that will be used for plotting and prediction weights_col (string) – the column in DataFrame that specifies weights per observation. robust (bool, optional (default=False)) – Compute the robust errors using the Huber sandwich estimator. initial_point ((d,) numpy array, optional) – initialize the starting point of the iterative algorithm. Default is the zero vector. entry_col (str) – specify a column in the DataFrame that denotes any late-entries (left truncation) that occurred. See the docs on left truncation fit_options (dict, optional) – pass kwargs into the underlying minimization algorithm, like tol, etc. self with additional new properties print_summary, params_, confidence_intervals_ and more
fit_right_censoring(*args, **kwargs)

Alias for fit

force_no_intercept = False
log_likelihood_ratio_test() → StatisticalResult

This function computes the likelihood ratio test for the model. We compare the existing model (with all the covariates) to the trivial model of no covariates.

mean_survival_time_

The mean survival time of the average subject in the training dataset.

median_survival_time_

The median survival time of the average subject in the training dataset.

plot(columns=None, parameter=None, ax=None, **errorbar_kwargs)

Produces a visual representation of the coefficients, including their standard errors and magnitudes.

Parameters: columns (list, optional) – specify a subset of the columns to plot errorbar_kwargs – pass in additional plotting commands to matplotlib errorbar command ax – the matplotlib axis that be edited. matplotlib axis
plot_covariate_groups(*args, **kwargs)

Deprecated as of v0.25.0. Use plot_partial_effects_on_outcome instead.

plot_partial_effects_on_outcome(covariates, values, plot_baseline=True, ax=None, times=None, y='survival_function', **kwargs)

Produces a plot comparing the baseline curve of the model versus what happens when a covariate(s) is varied over values in a group. This is useful to compare subjects’ as we vary covariate(s), all else being held equal. The baseline curve is equal to the predicted y-curve at all average values in the original dataset.

Parameters: covariates (string or list) – a string (or list of strings) of the covariate in the original dataset that we wish to vary. values (1d or 2d iterable) – an iterable of the values we wish the covariate to take on. plot_baseline (bool) – also display the baseline survival, defined as the survival at the mean of the original dataset. times – pass in a times to plot y (str) – one of “survival_function”, “hazard”, “cumulative_hazard”. Default “survival_function” kwargs – pass in additional plotting commands ax – the matplotlib axis that be edited. matplotlib axis, or list of axis’

Examples

from lifelines import datasets, WeibullAFTFitter
wf = WeibullAFTFitter().fit(rossi, 'week', 'arrest')
wf.plot_partial_effects_on_outcome('prio', values=np.arange(0, 15, 3), cmap='coolwarm')

# multiple variables at once
wf.plot_partial_effects_on_outcome(['prio', 'paro'], values=[[0, 0], [5, 0], [10, 0], [0, 1], [5, 1], [10, 1]], cmap='coolwarm')

# if you have categorical variables, you can simply things:
wf.plot_partial_effects_on_outcome(['dummy1', 'dummy2', 'dummy3'], values=np.eye(3))

predict_cumulative_hazard(df, times=None, conditional_after=None) → pandas.core.frame.DataFrame

Return the cumulative hazard rate of subjects in X at time points.

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. If a numpy array, columns must be in the same order as the training data. times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index. cumulative_hazard_ – the cumulative hazard of individuals over the timeline DataFrame
predict_expectation(X, conditional_after=None) → pandas.core.series.Series

Compute the expected lifetime, $$E[T]$$, using covariates X. This algorithm to compute the expectation is to use the fact that $$E[T] = \int_0^\inf P(T > t) dt = \int_0^\inf S(t) dt$$. To compute the integral, we use the trapizoidal rule to approximate the integral.

Caution

If the survival function doesn’t converge to 0, the the expectation is really infinity and the returned values are meaningless/too large. In that case, using predict_median or predict_percentile would be better.

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. expectations DataFrame

Notes

If X is a DataFrame, the order of the columns do not matter. But if X is an array, then the column ordering is assumed to be the same as the training dataset.

predict_hazard(df, *, conditional_after=None, times=None)

Predict the hazard for individuals, given their covariates.

Parameters: df (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. times (iterable, optional) – an iterable (array, list, series) of increasing times to predict the cumulative hazard at. Default is the set of all durations in the training dataset (observed and unobserved). conditional_after – Not implemented yet. the hazards of individuals over the timeline DataFrame
predict_median(df, *, conditional_after=None) → pandas.core.frame.DataFrame

Predict the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity.

Parameters: X (numpy array or DataFrame) – a (n,d) covariate numpy array or DataFrame. If a DataFrame, columns can be in any order. conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly non-zero values that represent how long the subject has already lived for. Ex: if $$T$$ is the unknown event time, then this represents $$T | T > s$$. This is useful for knowing the remaining hazard/survival of censored subjects. The new timeline is the remaining duration of the subject, i.e. normalized back to starting at 0. percentiles – the median lifetimes for the individuals. If the survival curve of an individual does not cross 0.5, then the result is infinity. DataFrame
predict_percentile(df, *, p=0.5, conditional_after=None) → pandas.core.series.Series
predict_survival_function(df, times=None, conditional_after=None) → pandas.core.frame.DataFrame

Predict the survival function for individuals, given their covariates. This assumes that the individual just entered the study (that is, we do not condition on how long they have already lived for.)

Parameters: df (DataFrame) – a (n,d) DataFrame. If a DataFrame, columns can be in any order. times (iterable, optional) – an iterable of increasing times to predict the cumulative hazard at. Default is the set of all durations (observed and unobserved). Uses a linear interpolation if points in time are not in the index. conditional_after (iterable, optional) – Must be equal is size to df.shape[0] (denoted n above). An iterable (array, list, series) of possibly non-zero values that represent how long the subject has already lived for. Ex: if $$T$$ is the unknown event time, then this represents $$T | T > s$$. This is useful for knowing the remaining hazard/survival of censored subjects. survival_function – the survival probabilities of individuals over the timeline DataFrame
print_summary(decimals: int = 2, style: Optional[str] = None, columns: Optional[list] = None, **kwargs) → None

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters: decimals (int, optional (default=2)) – specify the number of decimal places to show style (string) – {html, ascii, latex} columns – only display a subset of summary columns. Default all. kwargs – print additional metadata in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
regressors = None
score(df: pandas.core.frame.DataFrame, scoring_method: str = 'log_likelihood') → float

Score the data in df on the fitted model. With default scoring method, returns the _average log-likelihood_.

Parameters: df (DataFrame) – the dataframe with duration col, event col, etc. scoring_method (str) – one of {‘log_likelihood’, ‘concordance_index’} log_likelihood: returns the average unpenalized log-likelihood. concordance_index: returns the concordance-index

Examples

from lifelines import WeibullAFTFitter

strata = None
summary