statistics¶

class
lifelines.statistics.
StatisticalResult
(p_value, test_statistic, name=None, test_name=None, **kwargs)¶ Bases:
object
This class holds the result of statistical tests with a nice printer wrapper to display the results.
Note
This class’ API changed in version 0.16.0.
Parameters:  p_value (iterable or float) – the pvalues of a statistical test(s)
 test_statistic (iterable or float) – the test statistics of a statistical test(s). Must be the same size as pvalues if iterable.
 test_name (string) – the test that was used. Lifelines should set this.
 name (iterable or string) – if this class holds multiple results (ex: from a pairwise comparison), this can hold the names. Must be the same size as pvalues if iterable.
 kwargs – additional information to attach to the object and display in
print_summary()
.

ascii_print
(decimals=2, **kwargs)¶

html_print
(decimals=2, **kwargs)¶

html_print_inside_jupyter
(decimals=2, **kwargs)¶

latex_print
(decimals=2, **kwargs)¶

print_specific_style
(style, decimals=2, **kwargs)¶ Parameters: style (str) – one of {‘ascii’, ‘html’, ‘latex’}

print_summary
(decimals=2, style=None, **kwargs)¶ Print summary statistics describing the fit, the coefficients, and the error bounds.
Parameters:  decimals (int, optional (default=2)) – specify the number of decimal places to show
 kwargs – print additional meta data in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.

summary
¶ returns: a DataFrame containing the test statistics and the pvalue :rtype: DataFrame

to_ascii
(decimals=2, **kwargs)¶

to_html
(decimals=2, **kwargs)¶

to_latex
(decimals=2, **kwargs)¶

lifelines.statistics.
logrank_test
(durations_A, durations_B, event_observed_A=None, event_observed_B=None, t_0=1, weightings=None, **kwargs) → lifelines.statistics.StatisticalResult¶ Measures and reports on whether two intensity processes are different. That is, given two event series, determines whether the data generating processes are statistically different. The teststatistic is chisquared under the null hypothesis. Let \(h_i(t)\) be the hazard ratio of group \(i\) at time \(t\), then:
\[\begin{split}\begin{align} & H_0: h_1(t) = h_2(t) \\ & H_A: h_1(t) = c h_2(t), \;\; c \ne 1 \end{align}\end{split}\]This implicitly uses the logrank weights.
Note
 The logrank test has maximum power when the assumption of proportional hazards is true. As a consequence, if the survival curves cross, the logrank test will give an inaccurate assessment of differences.
 This implementation is a special case of the function
multivariate_logrank_test
, which is used internally. See Survival and Event Analysis, page 108.  There are only disadvantages to using the logrank test versus using the Cox regression. See more here for a discussion. To convert to using the Cox regression:
from lifelines import CoxPHFitter dfA = pd.DataFrame({'E': event_observed_A, 'T': durations_A, 'groupA': 1}) dfB = pd.DataFrame({'E': event_observed_B, 'T': durations_B, 'groupA': 0}) df = pd.concat([dfA, dfB]) cph = CoxPHFitter().fit(df, 'T', 'E') cph.print_summary()
Parameters: durations_A (iterable) – a (n,) listlike of event durations (birth to death,…) for the first population.
durations_B (iterable) – a (n,) listlike of event durations (birth to death,…) for the second population.
event_observed_A (iterable, optional) – a (n,) listlike of censorship flags, (1 if observed, 0 if not), for the first population. Default assumes all observed.
event_observed_B (iterable, optional) – a (n,) listlike of censorship flags, (1 if observed, 0 if not), for the second population. Default assumes all observed.
t_0 (float, optional (default=1)) – the final time period under observation, 1 for all time.
weightings (str, optional) – apply a weighted logrank test: options are “wilcoxon” for Wilcoxon (also known as Breslow), “taroneware” for TaroneWare, “peto” for Peto test and “flemingharrington” for FlemingHarrington test. These are useful for testing for early or late differences in the survival curve. For the FlemingHarrington test, keyword arguments p and q must also be provided with nonnegative values.
 Weightings are applied at the ith ordered failure time, \(t_{i}\), according to:
Wilcoxon: \(n_i\) TaroneWare: \(\sqrt{n_i}\) Peto: \(\bar{S}(t_i)\) FlemingHarrington: \(\hat{S}(t_i)^p \times (1  \hat{S}(t_i))^q\)
where \(n_i\) is the number at risk just prior to time \(t_{i}\), \(\bar{S}(t_i)\) is PetoPeto’s modified survival estimate and \(\hat{S}(t_i)\) is the leftcontinuous KaplanMeier survival estimate at time \(t_{i}\).
Returns: a StatisticalResult object with properties
p_value
,summary
,test_statistic
,print_summary
Return type: Examples
T1 = [1, 4, 10, 12, 12, 3, 5.4] E1 = [1, 0, 1, 0, 1, 1, 1] T2 = [4, 5, 7, 11, 14, 20, 8, 8] E2 = [1, 1, 1, 1, 1, 1, 1, 1] from lifelines.statistics import logrank_test results = logrank_test(T1, T2, event_observed_A=E1, event_observed_B=E2) results.print_summary() print(results.p_value) # 0.7676 print(results.test_statistic) # 0.0872

lifelines.statistics.
multivariate_logrank_test
(event_durations, groups, event_observed=None, t_0=1, weightings=None, **kwargs) → lifelines.statistics.StatisticalResult¶ This test is a generalization of the logrank_test: it can deal with n>2 populations (and should be equal when n=2):
\[\begin{split}\begin{align} & H_0: h_1(t) = h_2(t) = h_3(t) = ... = h_n(t) \\ & H_A: \text{there exist at least one group that differs from the other.} \end{align}\end{split}\]Parameters: event_durations (iterable) – a (n,) listlike representing the (possibly partial) durations of all individuals
groups (iterable) – a (n,) listlike of unique group labels for each individual.
event_observed (iterable, optional) – a (n,) listlike of event_observed events: 1 if observed death, 0 if censored. Defaults to all observed.
t_0 (float, optional (default=1)) – the period under observation, 1 for all time.
weightings (str, optional) – apply a weighted logrank test: options are “wilcoxon” for Wilcoxon (also known as Breslow), “taroneware” for TaroneWare, “peto” for Peto test and “flemingharrington” for FlemingHarrington test. These are useful for testing for early or late differences in the survival curve. For the FlemingHarrington test, keyword arguments p and q must also be provided with nonnegative values.
 Weightings are applied at the ith ordered failure time, \(t_{i}\), according to:
Wilcoxon: \(n_i\) TaroneWare: \(\sqrt{n_i}\) Peto: \(\bar{S}(t_i)\) FlemingHarrington: \(\hat{S}(t_i)^p \times (1  \hat{S}(t_i))^q\)
where \(n_i\) is the number at risk just prior to time \(t_{i}\), \(\bar{S}(t_i)\) is PetoPeto’s modified survival estimate and \(\hat{S}(t_i)\) is the leftcontinuous KaplanMeier survival estimate at time \(t_{i}\).
kwargs – add keywords and metadata to the experiment summary.
Returns: a StatisticalResult object with properties
p_value
,summary
,test_statistic
,print_summary
Return type: Examples
df = pd.DataFrame({ 'durations': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7], 'events': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0], 'groups': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2] }) result = multivariate_logrank_test(df['durations'], df['groups'], df['events']) result.test_statistic result.p_value result.print_summary() # numpy example G = [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2] T = [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7] E = [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0] result = multivariate_logrank_test(T, G, E) result.test_statistic
See also

lifelines.statistics.
pairwise_logrank_test
(event_durations, groups, event_observed=None, t_0=1, weightings=None, **kwargs) → lifelines.statistics.StatisticalResult¶ Perform the logrank test pairwise for all \(n \ge 2\) unique groups.
Parameters: event_durations (iterable) – a (n,) listlike representing the (possibly partial) durations of all individuals
groups (iterable) – a (n,) listlike of unique group labels for each individual.
event_observed (iterable, optional) – a (n,) listlike of event_observed events: 1 if observed death, 0 if censored. Defaults to all observed.
t_0 (float, optional (default=1)) – the period under observation, 1 for all time.
weightings (str, optional) – apply a weighted logrank test: options are “wilcoxon” for Wilcoxon (also known as Breslow), “taroneware” for TaroneWare, “peto” for Peto test and “flemingharrington” for FlemingHarrington test. These are useful for testing for early or late differences in the survival curve. For the FlemingHarrington test, keyword arguments p and q must also be provided with nonnegative values.
 Weightings are applied at the ith ordered failure time, \(t_{i}\), according to:
Wilcoxon: \(n_i\) TaroneWare: \(\sqrt{n_i}\) Peto: \(\bar{S}(t_i)\) FlemingHarrington: \(\hat{S}(t_i)^p \times (1  \hat{S}(t_i))^q\)
where \(n_i\) is the number at risk just prior to time \(t_{i}\), \(\bar{S}(t_i)\) is PetoPeto’s modified survival estimate and \(\hat{S}(t_i)\) is the leftcontinuous KaplanMeier survival estimate at time \(t_{i}\).
kwargs – add keywords and metadata to the experiment summary.
Returns: a StatisticalResult object that contains all the pairwise comparisons (try
StatisticalResult.summary
orStatisticalResult.print_summary
)Return type: See also

lifelines.statistics.
survival_difference_at_fixed_point_in_time_test
(point_in_time, durations_A, durations_B, event_observed_A=None, event_observed_B=None, **kwargs) → lifelines.statistics.StatisticalResult¶ Often analysts want to compare the survivalness of groups at specific times, rather than comparing the entire survival curves against each other. For example, analysts may be interested in 5year survival. Statistically comparing the naive KaplanMeier points at a specific time actually has reduced power (see [1]). By transforming the KaplanMeier curve, we can recover more power. This function uses the log(log) transformation.
Parameters:  point_in_time (float,) – the point in time to analyze the survival curves at.
 durations_A (iterable) – a (n,) listlike of event durations (birth to death,…) for the first population.
 durations_B (iterable) – a (n,) listlike of event durations (birth to death,…) for the second population.
 event_observed_A (iterable, optional) – a (n,) listlike of censorship flags, (1 if observed, 0 if not), for the first population. Default assumes all observed.
 event_observed_B (iterable, optional) – a (n,) listlike of censorship flags, (1 if observed, 0 if not), for the second population. Default assumes all observed.
 kwargs – add keywords and metadata to the experiment summary
Returns: a StatisticalResult object with properties
p_value
,summary
,test_statistic
,print_summary
Return type: Examples
T1 = [1, 4, 10, 12, 12, 3, 5.4] E1 = [1, 0, 1, 0, 1, 1, 1] T2 = [4, 5, 7, 11, 14, 20, 8, 8] E2 = [1, 1, 1, 1, 1, 1, 1, 1] from lifelines.statistics import survival_difference_at_fixed_point_in_time_test results = survival_difference_at_fixed_point_in_time_test(12, T1, T2, event_observed_A=E1, event_observed_B=E2) results.print_summary() print(results.p_value) # 0.893 print(results.test_statistic) # 0.017
Notes
Other transformations are possible, but Klein et al. [1] showed that the log(log(c)) transform has the most desirable statistical properties.
References
[1] Klein, J. P., Logan, B. , Harhoff, M. and Andersen, P. K. (2007), Analyzing survival curves at a fixed point in time. Statist. Med., 26: 45054519. doi:10.1002/sim.2864

lifelines.statistics.
proportional_hazard_test
(fitted_cox_model, training_df, time_transform='rank', precomputed_residuals=None, **kwargs) → lifelines.statistics.StatisticalResult¶ Test whether any variable in a Cox model breaks the proportional hazard assumption. This method uses an approximation that R’s
survival
use to use, but changed it in late 2019, hence there will be differences here between lifelines and R.Parameters:  fitted_cox_model (CoxPHFitter) – the fitted Cox model, fitted with training_df, you wish to test. Currently only the CoxPHFitter is supported, but later CoxTimeVaryingFitter, too.
 training_df (DataFrame) – the DataFrame used in the call to the Cox model’s
fit
.  time_transform (vectorized function, list, or string, optional (default=’rank’)) – {‘all’, ‘km’, ‘rank’, ‘identity’, ‘log’} One of the strings above, a list of strings, or a function to transform the time (must accept (time, durations, weights) however). ‘all’ will present all the transforms.
 precomputed_residuals (DataFrame, optional) – specify the scaled Schoenfeld residuals, if already computed.
 kwargs – additional parameters to add to the StatisticalResult
Notes
R uses the default km, we use rank, as this performs well versus other transforms. See http://eprints.lse.ac.uk/84988/1/06_ParkHendry2015ReassessingSchoenfeldTests_Final.pdf
References

lifelines.statistics.
power_under_cph
(n_exp, n_con, p_exp, p_con, postulated_hazard_ratio, alpha=0.05) → float¶ This computes the power of the hypothesis test that the two groups, experiment and control, have different hazards (that is, the relative hazard ratio is different from 1.)
Parameters:  n_exp (integer) – size of the experiment group.
 n_con (integer) – size of the control group.
 p_exp (float) – probability of failure in experimental group over period of study.
 p_con (float) – probability of failure in control group over period of study
 postulated_hazard_ratio (float)
 the postulated hazard ratio
 alpha (float, optional (default=0.05)) – type I error rate
Returns: power to detect the magnitude of the hazard ratio as small as that specified by postulated_hazard_ratio.
Return type: float
Notes
See also

lifelines.statistics.
sample_size_necessary_under_cph
(power, ratio_of_participants, p_exp, p_con, postulated_hazard_ratio, alpha=0.05)¶ This computes the sample size for needed power to compare two groups under a Cox Proportional Hazard model.
Parameters:  power (float) – power to detect the magnitude of the hazard ratio as small as that specified by postulated_hazard_ratio.
 ratio_of_participants (ratio of participants in experimental group over control group.)
 p_exp (float) – probability of failure in experimental group over period of study.
 p_con (float) – probability of failure in control group over period of study
 postulated_hazard_ratio (float) – the postulated hazard ratio
 alpha (float, optional (default=0.05)) – type I error rate
Returns:  n_exp (integer) – the samples sizes need for the experiment to achieve desired power
 n_con (integer) – the samples sizes need for the control group to achieve desired power
Examples
from lifelines.statistics import sample_size_necessary_under_cph desired_power = 0.8 ratio_of_participants = 1. p_exp = 0.25 p_con = 0.35 postulated_hazard_ratio = 0.7 n_exp, n_con = sample_size_necessary_under_cph(desired_power, ratio_of_participants, p_exp, p_con, postulated_hazard_ratio) # (421, 421)
References
https://cran.rproject.org/web/packages/powerSurvEpi/powerSurvEpi.pdf
See also