statistics

class lifelines.statistics.StatisticalResult(p_value, test_statistic, name=None, **kwargs)

Bases: object

This class holds the result of statistical tests with a nice printer wrapper to display the results.

Note

This class’ API changed in version 0.16.0.

Parameters:
  • p_value (iterable or float) – the p-values of a statistical test(s)
  • test_statistic (iterable or float) – the test statistics of a statistical test(s). Must be the same size as p-values if iterable.
  • name (iterable or string) – if this class holds multiple results (ex: from a pairwise comparison), this can hold the names. Must be the same size as p-values if iterable.
  • kwargs – additional information to attach to the object and display in print_summary().
print_summary(decimals=2, **kwargs)

Print summary statistics describing the fit, the coefficients, and the error bounds.

Parameters:
  • decimals (int, optional (default=2)) – specify the number of decimal places to show
  • kwargs – print additional meta data in the output (useful to provide model names, dataset names, etc.) when comparing multiple outputs.
summary

returns: a DataFrame containing the test statistics and the p-value :rtype: DataFrame

lifelines.statistics.logrank_test(durations_A, durations_B, event_observed_A=None, event_observed_B=None, t_0=-1, **kwargs)

Measures and reports on whether two intensity processes are different. That is, given two event series, determines whether the data generating processes are statistically different. The test-statistic is chi-squared under the null hypothesis. Let \(h_i(t)\) be the hazard ratio of group \(i\) at time \(t\), then:

\[\begin{split}\begin{align} & H_0: h_1(t) = h_2(t) \\ & H_A: h_1(t) = c h_2(t), \;\; c \ne 1 \end{align}\end{split}\]

This implicitly uses the log-rank weights.

Note

The logrank test has maximum power when the assumption of proportional hazards is true. As a consequence, if the survival curves cross, the logrank test will give an inaccurate assessment of differences.

Parameters:
  • durations_A (iterable) – a (n,) list-like of event durations (birth to death,…) for the first population.
  • durations_B (iterable) – a (n,) list-like of event durations (birth to death,…) for the second population.
  • event_observed_A (iterable, optional) – a (n,) list-like of censorship flags, (1 if observed, 0 if not), for the first population. Default assumes all observed.
  • event_observed_B (iterable, optional) – a (n,) list-like of censorship flags, (1 if observed, 0 if not), for the second population. Default assumes all observed.
  • t_0 (float, optional (default=-1)) – the final time period under observation, -1 for all time.
  • kwargs – add keywords and meta-data to the experiment summary
Returns:

a StatisticalResult object with properties p_value, summary, test_statistic, print_summary

Return type:

StatisticalResult

Examples

>>> T1 = [1, 4, 10, 12, 12, 3, 5.4]
>>> E1 = [1, 0, 1,  0,  1,  1, 1]
>>>
>>> T2 = [4, 5, 7, 11, 14, 20, 8, 8]
>>> E2 = [1, 1, 1, 1,  1,  1,  1, 1]
>>>
>>> from lifelines.statistics import logrank_test
>>> results = logrank_test(T1, T2, event_observed_A=E1, event_observed_B=E2)
>>>
>>> results.print_summary()
>>> print(results.p_value)        # 0.7676
>>> print(results.test_statistic) # 0.0872

Notes

This is a special case of the function multivariate_logrank_test, which is used internally. See Survival and Event Analysis, page 108.

lifelines.statistics.multivariate_logrank_test(event_durations, groups, event_observed=None, t_0=-1, **kwargs)

This test is a generalization of the logrank_test: it can deal with n>2 populations (and should be equal when n=2):

\[\begin{split}\begin{align} & H_0: h_1(t) = h_2(t) = h_3(t) = ... = h_n(t) \\ & H_A: \text{there exist at least one group that differs from the other.} \end{align}\end{split}\]
Parameters:
  • event_durations (iterable) – a (n,) list-like representing the (possibly partial) durations of all individuals
  • groups (iterable) – a (n,) list-like of unique group labels for each individual.
  • event_observed (iterable, optional) – a (n,) list-like of event_observed events: 1 if observed death, 0 if censored. Defaults to all observed.
  • t_0 (float, optional (default=-1)) – the period under observation, -1 for all time.
  • kwargs – add keywords and meta-data to the experiment summary.
Returns:

a StatisticalResult object with properties p_value, summary, test_statistic, print_summary

Return type:

StatisticalResult

Examples

>>> df = pd.DataFrame({
>>>    'durations': [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7],
>>>    'events': [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0],
>>>    'groups': [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2]
>>> })
>>> result = multivariate_logrank_test(df['durations'], df['groups'], df['events'])
>>> result.test_statistic
>>> result.p_value
>>> result.print_summary()
>>> # numpy example
>>> G = [0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2]
>>> T = [5, 3, 9, 8, 7, 4, 4, 3, 2, 5, 6, 7]
>>> E = [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0]
>>> result = multivariate_logrank_test(T, G, E)
>>> result.test_statistic
lifelines.statistics.pairwise_logrank_test(event_durations, groups, event_observed=None, t_0=-1, **kwargs)

Perform the logrank test pairwise for all \(n \ge 2\) unique groups.

Parameters:
  • event_durations (iterable) – a (n,) list-like representing the (possibly partial) durations of all individuals
  • groups (iterable) – a (n,) list-like of unique group labels for each individual.
  • event_observed (iterable, optional) – a (n,) list-like of event_observed events: 1 if observed death, 0 if censored. Defaults to all observed.
  • t_0 (float, optional (default=-1)) – the period under observation, -1 for all time.
  • kwargs – add keywords and meta-data to the experiment summary.
Returns:

a StatisticalResult object that contains all the pairwise comparisons (try StatisticalResult.summary or StatisticalResult.print_summarty)

Return type:

StatisticalResult

lifelines.statistics.survival_difference_at_fixed_point_in_time_test(point_in_time, durations_A, durations_B, event_observed_A=None, event_observed_B=None, **kwargs)

Often analysts want to compare the survival-ness of groups at specific times, rather than comparing the entire survival curves against each other. For example, analysts may be interested in 5-year survival. Statistically comparing the naive Kaplan-Meier points at a specific time actually has reduced power (see [1]). By transforming the Kaplan-Meier curve, we can recover more power. This function uses the log(-log) transformation.

Parameters:
  • point_in_time (float,) – the point in time to analyze the survival curves at.
  • durations_A (iterable) – a (n,) list-like of event durations (birth to death,…) for the first population.
  • durations_B (iterable) – a (n,) list-like of event durations (birth to death,…) for the second population.
  • event_observed_A (iterable, optional) – a (n,) list-like of censorship flags, (1 if observed, 0 if not), for the first population. Default assumes all observed.
  • event_observed_B (iterable, optional) – a (n,) list-like of censorship flags, (1 if observed, 0 if not), for the second population. Default assumes all observed.
  • kwargs – add keywords and meta-data to the experiment summary
Returns:

a StatisticalResult object with properties p_value, summary, test_statistic, print_summary

Return type:

StatisticalResult

Examples

>>> T1 = [1, 4, 10, 12, 12, 3, 5.4]
>>> E1 = [1, 0, 1,  0,  1,  1, 1]
>>>
>>> T2 = [4, 5, 7, 11, 14, 20, 8, 8]
>>> E2 = [1, 1, 1, 1,  1,  1,  1, 1]
>>>
>>> from lifelines.statistics import survival_difference_at_fixed_point_in_time_test
>>> results = survival_difference_at_fixed_point_in_time_test(12, T1, T2, event_observed_A=E1, event_observed_B=E2)
>>>
>>> results.print_summary()
>>> print(results.p_value)        # 0.893
>>> print(results.test_statistic) # 0.017

Notes

Other transformations are possible, but Klein et al. [1] showed that the log(-log(c)) transform has the most desirable statistical properties.

References

[1] Klein, J. P., Logan, B. , Harhoff, M. and Andersen, P. K. (2007), Analyzing survival curves at a fixed point in time. Statist. Med., 26: 4505-4519. doi:10.1002/sim.2864

lifelines.statistics.proportional_hazard_test(fitted_cox_model, training_df, time_transform='rank', precomputed_residuals=None, **kwargs)

Test whether any variable in a Cox model breaks the proportional hazard assumption.

Parameters:
  • fitted_cox_model (CoxPHFitter) – the fitted Cox model, fitted with training_df, you wish to test. Currently only the CoxPHFitter is supported, but later CoxTimeVaryingFitter, too.
  • training_df (DataFrame) – the DataFrame used in the call to the Cox model’s fit.
  • time_transform (vectorized function, list, or string, optional (default=’rank’)) – {‘all’, ‘km’, ‘rank’, ‘identity’, ‘log’} One of the strings above, a list of strings, or a function to transform the time (must accept (time, durations, weights) however). ‘all’ will present all the transforms.
  • precomputed_residuals (DataFrame, optional) – specify the residuals, if already computed.
  • kwargs – additional parameters to add to the StatisticalResult
Returns:

Return type:

StatisticalResult

Notes

R uses the default km, we use rank, as this performs well versus other transforms. See http://eprints.lse.ac.uk/84988/1/06_ParkHendry2015-ReassessingSchoenfeldTests_Final.pdf

lifelines.statistics.power_under_cph(n_exp, n_con, p_exp, p_con, postulated_hazard_ratio, alpha=0.05)

This computes the power of the hypothesis test that the two groups, experiment and control, have different hazards (that is, the relative hazard ratio is different from 1.)

Parameters:
  • n_exp (integer) – size of the experiment group.
  • n_con (integer) – size of the control group.
  • p_exp (float) – probability of failure in experimental group over period of study.
  • p_con (float) – probability of failure in control group over period of study
  • postulated_hazard_ratio (float)
  • the postulated hazard ratio
  • alpha (float, optional (default=0.05)) – type I error rate
Returns:

power to detect the magnitude of the hazard ratio as small as that specified by postulated_hazard_ratio.

Return type:

float

Notes

Reference.

lifelines.statistics.sample_size_necessary_under_cph(power, ratio_of_participants, p_exp, p_con, postulated_hazard_ratio, alpha=0.05)

This computes the sample size for needed power to compare two groups under a Cox Proportional Hazard model.

Parameters:
  • power (float) – power to detect the magnitude of the hazard ratio as small as that specified by postulated_hazard_ratio.
  • ratio_of_participants (ratio of participants in experimental group over control group.)
  • p_exp (float) – probability of failure in experimental group over period of study.
  • p_con (float) – probability of failure in control group over period of study
  • postulated_hazard_ratio (float) – the postulated hazard ratio
  • alpha (float, optional (default=0.05)) – type I error rate
Returns:

  • n_exp (integer) – the samples sizes need for the experiment to achieve desired power
  • n_con (integer) – the samples sizes need for the control group to achieve desired power

Examples

>>> from lifelines.statistics import sample_size_necessary_under_cph
>>>
>>> desired_power = 0.8
>>> ratio_of_participants = 1.
>>> p_exp = 0.25
>>> p_con = 0.35
>>> postulated_hazard_ratio = 0.7
>>> n_exp, n_con = sample_size_necessary_under_cph(desired_power, ratio_of_participants, p_exp, p_con, postulated_hazard_ratio)
>>> # (421, 421)

References

https://cran.r-project.org/web/packages/powerSurvEpi/powerSurvEpi.pdf