statistics¶
- class lifelines.statistics.StatisticalResult(p_value, test_statistic, name=None, test_name=None, **kwargs)¶
This class holds the result of statistical tests with a nice printer wrapper to display the results.
Note
This class’ API changed in version 0.16.0.
- Parameters:
p_value (iterable or float) – the p-values of a statistical test(s)
test_statistic (iterable or float) – the test statistics of a statistical test(s). Must be the same size as p-values if iterable.
test_name (string) – the test that was used. lifelines should set this.
name (iterable or string) – if this class holds multiple results (ex: from a pairwise comparison), this can hold the names. Must be the same size as p-values if iterable.
kwargs – additional information to attach to the object and display in
print_summary()
.
- print_summary(decimals=2, style=None, **kwargs)¶
Print summary statistics describing the results.
- Parameters:
decimals (int, optional (default=2)) – specify the number of decimal places to show
style (string,) – {html, ascii, latex}, default ascii
kwargs – print additional meta data in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
- property summary¶
returns: a DataFrame containing the test statistics and the p-value :rtype: DataFrame
- to_ascii(decimals=2, **kwargs)¶
- to_html(decimals=2, **kwargs)¶
- to_latex(decimals=2, **kwargs)¶
- lifelines.statistics.logrank_test(durations_A, durations_B, event_observed_A=None, event_observed_B=None, t_0=-1, weights_A=None, weights_B=None, weightings=None, **kwargs) StatisticalResult ¶
Measures and reports on whether two intensity processes are different. That is, given two event series, determines whether the data generating processes are statistically different. The test-statistic is chi-squared under the null hypothesis. Let \(h_i(t)\) be the hazard ratio of group \(i\) at time \(t\), then:
\[\begin{split}\begin{align} & H_0: h_1(t) = h_2(t) \\ & H_A: h_1(t) = c h_2(t), \;\; c \ne 1 \end{align}\end{split}\]This implicitly uses the log-rank weights.
Note
lifelines logrank implementation only handles right-censored data.
The logrank test has maximum power when the assumption of proportional hazards is true. As a consequence, if the survival curves cross, the logrank test will give an inaccurate assessment of differences.
This implementation is a special case of the function
multivariate_logrank_test
, which is used internally. See Survival and Event Analysis, page 108.There are only disadvantages to using the log-rank test versus using the Cox regression. See more here for a discussion. To convert to using the Cox regression:
from lifelines import CoxPHFitter dfA = pd.DataFrame({'E': event_observed_A, 'T': durations_A, 'groupA': 1}) dfB = pd.DataFrame({'E': event_observed_B, 'T': durations_B, 'groupA': 0}) df = pd.concat([dfA, dfB]) cph = CoxPHFitter().fit(df, 'T', 'E') cph.print_summary()
- Parameters:
durations_A (iterable) – a (n,) list-like of event durations (birth to death,…) for the first population.
durations_B (iterable) – a (n,) list-like of event durations (birth to death,…) for the second population.
event_observed_A (iterable, optional) – a (n,) list-like of censorship flags, (1 if observed, 0 if not), for the first population. Default assumes all observed.
event_observed_B (iterable, optional) – a (n,) list-like of censorship flags, (1 if observed, 0 if not), for the second population. Default assumes all observed.
weights_A (iterable, optional) – case weights
weights_B (iterable, optional) – case weights
t_0 (float, optional (default=-1)) – The final time period under observation, and subjects who experience the event after this time are set to be censored. Specify -1 to use all time.
weightings (str, optional) – apply a weighted logrank test: options are “wilcoxon” for Wilcoxon (also known as Breslow), “tarone-ware” for Tarone-Ware, “peto” for Peto test and “fleming-harrington” for Fleming-Harrington test. These are useful for testing for early or late differences in the survival curve. For the Fleming-Harrington test, keyword arguments p and q must also be provided with non-negative values.
- Weightings are applied at the ith ordered failure time, \(t_{i}\), according to:
Wilcoxon: \(n_i\) Tarone-Ware: \(\sqrt{n_i}\) Peto: \(\bar{S}(t_i)\) Fleming-Harrington: \(\hat{S}(t_i)^p \times (1 - \hat{S}(t_i))^q\)
where \(n_i\) is the number at risk just prior to time \(t_{i}\), \(\bar{S}(t_i)\) is Peto-Peto’s modified survival estimate and \(\hat{S}(t_i)\) is the left-continuous Kaplan-Meier survival estimate at time \(t_{i}\).
- Returns:
a StatisticalResult object with properties
p_value
,summary
,test_statistic
,print_summary
- Return type:
Examples
T1 = [1, 4, 10, 12, 12, 3, 5.4] E1 = [1, 0, 1, 0, 1, 1, 1] T2 = [4, 5, 7, 11, 14, 20, 8, 8] E2 = [1, 1, 1, 1, 1, 1, 1, 1] from lifelines.statistics import logrank_test results = logrank_test(T1, T2, event_observed_A=E1, event_observed_B=E2) results.print_summary() print(results.p_value) # 0.7676 print(results.test_statistic) # 0.0872
- lifelines.statistics.multivariate_logrank_test(event_durations, groups, event_observed=None, weights=None, t_0=-1, weightings=None, **kwargs) StatisticalResult ¶
This test is a generalization of the logrank_test: it can deal with n>2 populations (and should be equal when n=2):
\[\begin{split}\begin{align} & H_0: h_1(t) = h_2(t) = h_3(t) = ... = h_n(t) \\ & H_A: \text{there exist at least one group that differs from the other.} \end{align}\end{split}\]- Parameters:
event_durations (iterable) – a (n,) list-like representing the (possibly partial) durations of all individuals
groups (iterable) – a (n,) list-like of unique group labels for each individual.
event_observed (iterable, optional) – a (n,) list-like of event_observed events: 1 if observed death, 0 if censored. Defaults to all observed.
weights (iterable, optional) – case-weights
t_0 (float, optional (default=-1)) – The final time period under observation, and subjects who experience the event after this time are set to be censored. Specify -1 to use all time.
weightings (str, optional) – apply a weighted logrank test: options are “wilcoxon” for Wilcoxon (also known as Breslow), “tarone-ware” for Tarone-Ware, “peto” for Peto test and “fleming-harrington” for Fleming-Harrington test. These are useful for testing for early or late differences in the survival curve. For the Fleming-Harrington test, keyword arguments p and q must also be provided with non-negative values.
- Weightings are applied at the ith ordered failure time, \(t_{i}\), according to:
Wilcoxon: \(n_i\) Tarone-Ware: \(\sqrt{n_i}\) Peto: \(\bar{S}(t_i)\) Fleming-Harrington: \(\hat{S}(t_i)^p \times (1 - \hat{S}(t_i))^q\)
where \(n_i\) is the number at risk just prior to time \(t_{i}\), \(\bar{S}(t_i)\) is Peto-Peto’s modified survival estimate and \(\hat{S}(t_i)\) is the left-continuous Kaplan-Meier survival estimate at time \(t_{i}\).
kwargs – add keywords and meta-data to the experiment summary.
- Returns:
a StatisticalResult object with properties
p_value
,summary
,test_statistic
,print_summary
- Return type:
Examples
df = pd.DataFrame({ 'durations': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], 'events': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], 'groups': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2] }) result = multivariate_logrank_test(df['durations'], df['groups'], df['events']) result.test_statistic result.p_value result.print_summary() # numpy example G = [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2] T = [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7] E = [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0] result = multivariate_logrank_test(T, G, E) result.test_statistic
See also
- lifelines.statistics.pairwise_logrank_test(event_durations, groups, event_observed=None, t_0=-1, weightings=None, **kwargs) StatisticalResult ¶
Perform the logrank test pairwise for all \(n \ge 2\) unique groups.
- Parameters:
event_durations (iterable) – a (n,) list-like representing the (possibly partial) durations of all individuals
groups (iterable) – a (n,) list-like of unique group labels for each individual.
event_observed (iterable, optional) – a (n,) list-like of event_observed events: 1 if observed death, 0 if censored. Defaults to all observed.
t_0 (float, optional (default=-1)) – The final time period under observation, and subjects who experience the event after this time are set to be censored. Specify -1 to use all time.
weightings (str, optional) – apply a weighted logrank test: options are “wilcoxon” for Wilcoxon (also known as Breslow), “tarone-ware” for Tarone-Ware, “peto” for Peto test and “fleming-harrington” for Fleming-Harrington test. These are useful for testing for early or late differences in the survival curve. For the Fleming-Harrington test, keyword arguments p and q must also be provided with non-negative values.
- Weightings are applied at the ith ordered failure time, \(t_{i}\), according to:
Wilcoxon: \(n_i\) Tarone-Ware: \(\sqrt{n_i}\) Peto: \(\bar{S}(t_i)\) Fleming-Harrington: \(\hat{S}(t_i)^p \times (1 - \hat{S}(t_i))^q\)
where \(n_i\) is the number at risk just prior to time \(t_{i}\), \(\bar{S}(t_i)\) is Peto-Peto’s modified survival estimate and \(\hat{S}(t_i)\) is the left-continuous Kaplan-Meier survival estimate at time \(t_{i}\).
kwargs – add keywords and meta-data to the experiment summary.
- Returns:
a StatisticalResult object that contains all the pairwise comparisons (try
StatisticalResult.summary
orStatisticalResult.print_summary
)- Return type:
See also
- lifelines.statistics.power_under_cph(n_exp, n_con, p_exp, p_con, postulated_hazard_ratio, alpha=0.05) float ¶
This computes the power of the hypothesis test that the two groups, experiment and control, have different hazards (that is, the relative hazard ratio is different from 1.)
- Parameters:
n_exp (integer) – size of the experiment group.
n_con (integer) – size of the control group.
p_exp (float) – probability of failure in experimental group over period of study.
p_con (float) – probability of failure in control group over period of study
postulated_hazard_ratio (float)
the postulated hazard ratio
alpha (float, optional (default=0.05)) – type I error rate
- Returns:
power to detect the magnitude of the hazard ratio as small as that specified by postulated_hazard_ratio.
- Return type:
float
Notes
See also
- lifelines.statistics.proportional_hazard_test(fitted_cox_model, training_df, time_transform='rank', precomputed_residuals=None, **kwargs) StatisticalResult ¶
Test whether any variable in a Cox model breaks the proportional hazard assumption. This method uses an approximation that R’s
survival
use to use, but changed it in late 2019, hence there will be differences here between lifelines and R.- Parameters:
fitted_cox_model (CoxPHFitter) – the fitted Cox model, fitted with training_df, you wish to test. Currently only the CoxPHFitter is supported, but later CoxTimeVaryingFitter, too.
training_df (DataFrame) – the DataFrame used in the call to the Cox model’s
fit
. Optional if providingprecomputed_residuals
time_transform (vectorized function, list, or string, optional (default=’rank’)) – {‘all’, ‘km’, ‘rank’, ‘identity’, ‘log’} One of the strings above, a list of strings, or a function to transform the time (must accept (time, durations, weights) however). ‘all’ will present all the transforms.
precomputed_residuals (DataFrame, optional) – specify the scaled Schoenfeld residuals, if already computed.
kwargs – additional parameters to add to the StatisticalResult
Notes
R uses the default km, we use rank, as this performs well versus other transforms. See http://eprints.lse.ac.uk/84988/1/06_ParkHendry2015-ReassessingSchoenfeldTests_Final.pdf
References
- lifelines.statistics.sample_size_necessary_under_cph(power, ratio_of_participants, p_exp, p_con, postulated_hazard_ratio, alpha=0.05)¶
This computes the sample size for needed power to compare two groups under a Cox Proportional Hazard model.
- Parameters:
power (float) – power to detect the magnitude of the hazard ratio as small as that specified by postulated_hazard_ratio.
ratio_of_participants (ratio of participants in experimental group over control group.)
p_exp (float) – probability of failure in experimental group over period of study.
p_con (float) – probability of failure in control group over period of study
postulated_hazard_ratio (float) – the postulated hazard ratio
alpha (float, optional (default=0.05)) – type I error rate
- Returns:
n_exp (integer) – the samples sizes need for the experiment to achieve desired power
n_con (integer) – the samples sizes need for the control group to achieve desired power
Examples
from lifelines.statistics import sample_size_necessary_under_cph desired_power = 0.8 ratio_of_participants = 1. p_exp = 0.25 p_con = 0.35 postulated_hazard_ratio = 0.7 n_exp, n_con = sample_size_necessary_under_cph(desired_power, ratio_of_participants, p_exp, p_con, postulated_hazard_ratio) # (421, 421)
References
https://cran.r-project.org/web/packages/powerSurvEpi/powerSurvEpi.pdf
See also
- lifelines.statistics.survival_difference_at_fixed_point_in_time_test(point_in_time, fitterA, fitterB, **result_kwargs) StatisticalResult ¶
Often analysts want to compare the survival-ness of groups at specific times, rather than comparing the entire survival curves against each other. For example, analysts may be interested in 5-year survival. Statistically comparing the naive Kaplan-Meier points at a specific time actually has reduced power (see [1]). By transforming the survival function, we can recover more power. This function uses the log(-log(·)) transformation.
- Parameters:
point_in_time (float,) – the point in time to analyze the survival curves at.
fitterA – A lifelines univariate model fitted to the data. This can be a
KaplanMeierFitter
,WeibullFitter
, etc.fitterB – the second lifelines model to compare against.
result_kwargs – add keywords and meta-data to the experiment summary
- Returns:
a StatisticalResult object with properties
p_value
,summary
,test_statistic
,print_summary
- Return type:
Examples
T1 = [1, 4, 10, 12, 12, 3, 5.4] E1 = [1, 0, 1, 0, 1, 1, 1] kmf1 = KaplanMeierFitter().fit(T1, E1) T2 = [4, 5, 7, 11, 14, 20, 8, 8] E2 = [1, 1, 1, 1, 1, 1, 1, 1] kmf2 = KaplanMeierFitter().fit(T2, E2) from lifelines.statistics import survival_difference_at_fixed_point_in_time_test results = survival_difference_at_fixed_point_in_time_test(12.0, kmf1, kmf2) results.print_summary() print(results.p_value) # 0.77 print(results.test_statistic) # 0.09
Notes
1. Other transformations are possible, but Klein et al. [1] showed that the log(-log(·)) transform has the most desirable statistical properties.
The API of this function changed in v0.25.3. This new API allows for right, left and interval censoring models to be tested.
References
[1] Klein, J. P., Logan, B. , Harhoff, M. and Andersen, P. K. (2007), Analyzing survival curves at a fixed point in time. Statist. Med., 26: 4505-4519. doi:10.1002/sim.2864