# datasets¶

lifelines.datasets.load_c_botulinum_lag_phase(**kwargs)

A dataset from [1] that represents the duration of the lag phase for C. botulinum, measured in days, at 30C. The data is left and right censored. Note that the table does not have 6% NaCl, but the authors mention no growth occurred (we can infer lag time > 85D then)

References

Montville, THOMAS J. “Interaction of pH and NaCl on culture density of Clostridium botulinum 62A.” Appl. Environ. Microbiol. 46.4 (1983): 961-963.

lifelines.datasets.load_canadian_senators(**kwargs)

A history of Canadian senators in office.:

Size: (933,10)
Example:
Name                                        Abbott, John Joseph Caldwell
Political Affiliation at Appointment                Liberal-Conservative
Province / Territory                                              Quebec
Appointed on the advice of                     Macdonald, John Alexander
Term (yyyy.mm.dd)                       1887.05.12 - 1893.10.30  (Death)
start_date                                           1887-05-12 00:00:00
end_date                                             1893-10-30 00:00:00
reason                                                             Death
diff_days                                                           2363
observed                                                            True

lifelines.datasets.load_dd(**kwargs)

Classification of political regimes as democracy and dictatorship. Classification of democracies as parliamentary, semi-presidential (mixed) and presidential. Classification of dictatorships as military, civilian and royal. Coverage: 202 countries, from 1946 or year of independence to 2008.:

Size: (1808, 12)
Example:
ctryname                                                   Afghanistan
cowcode2                                                           700
politycode                                                         700
un_region_name                                           Southern Asia
un_continent_name                                                 Asia
democracy                                                Non-democracy
regime                                                        Monarchy
start_year                                                        1946
duration                                                             7
observed                                                             1


References

Cheibub, José Antonio, Jennifer Gandhi, and James Raymond Vreeland. 2010. “Democracy and Dictatorship Revisited.” Public Choice, vol. 143, no. 2-1, pp. 67-101.

lifelines.datasets.load_dfcv()

A toy example of a time dependent dataset.

Size: (14, 6)
Example:

start  group  z  stop  id  event
0    1.0  0   3.0   1   True
0    1.0  0   5.0   2  False
0    1.0  1   5.0   3   True
0    1.0  0   6.0   4   True


References

lifelines.datasets.load_diabetes(**kwargs)

An interval censored dataset.

References

Borch-Johnsens, K, Andersen, P and Decker, T (1985). “The effect of proteinuria on relative mortality in Type I (insulin-dependent) diabetes mellitus.” Diabetologia, 28, 590-596.

Size: (731, 3)
Example:

left  right  gender
24     27    male
22     22  female
37     39    male
20     20    male
1     16    male
8     20  female
14     14    male

lifelines.datasets.load_g3(**kwargs)
Size: (17,7)
Example:
no.               1
age              41
sex          Female
group           RIT
event          True
time             53

lifelines.datasets.load_gbsg2(**kwargs)

A data frame containing the observations from the GBSG2 study of 686 women.:

Size: (686,10)
Example:
horTh           yes
age             56
menostat        Post
tsize           12
pnodes          7
progrec         61
estrec          77
time            2018
cens            1


References

1. Sauerbrei and P. Royston (1999). Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. Journal of the Royal Statistics Society Series A, Volume 162(1), 71–94
1. Schumacher, G. Basert, H. Bojar, K. Huebner, M. Olschewski, W. Sauerbrei, C. Schmoor, C. Beyerle, R.L.A. Neumann and H.F. Rauschecker for the German Breast Cancer Study Group (1994), Randomized 2 × 2 trial evaluating hormonal treatment and the duration of chemotherapy in node- positive breast cancer patients. Journal of Clinical Oncology, 12, 2086–2093
lifelines.datasets.load_holly_molly_polly(**kwargs)

From https://stat.ethz.ch/education/semesters/ss2011/seminar/contents/presentation_10.pdf Used as a toy example for CoxPH in recurrent SA.:

  ID  Status  Stratum  Start(days)  Stop(days)  tx    T
0  M       1        1            0         100   1  100
1  M       1        2          100         105   1    5
2  H       1        1            0          30   0   30
3  H       1        2           30          50   0   20
4  P       1        1            0          20   0   20

lifelines.datasets.load_kidney_transplant(**kwargs)
Size: (863,6)
Example:
time             5
death            0
age             51
black_male       0
white_male       1
black_female     0

lifelines.datasets.load_larynx(**kwargs)
Size: (89,6)
Example:
time  age  death  Stage_II  Stage_III  Stage_IV
0.6   77      1         0          0         0
1.3   53      1         0          0         0
2.4   45      1         0          0         0
2.5   57      0         0          0         0
3.2   58      1         0          0         0

lifelines.datasets.load_lcd(**kwargs)

Copper concentrations (µg/L) in shallow groundwater samples from two different geological zones in the San Joaquin Valley, California. The alluvial fan data include four different detection limits and the basin trough data include five different detection limits.

Millard, S.P. and Deverel, S.J. (1988). Nonparametric statistical methods for comparing two sites based on data with multiple non-detect limits. Water Resources Research 24: doi: 10.1029/88WR03412. issn: 0043-1397.

Size: (104,3)
Example:
C  T         group
0  1  alluvial_fan
0  1  alluvial_fan
0  1  alluvial_fan
0  1  alluvial_fan
1  1  alluvial_fan

lifelines.datasets.load_leukemia(**kwargs)

Leukemia dataset.:

Size: (42,5)
Example:
t  status  sex  logWBC  Rx
0  35       0    1    1.45   0
1  34       0    1    1.47   0
2  32       0    1    2.20   0
3  32       0    1    2.53   0
4  25       0    1    1.78   0


References

lifelines.datasets.load_lung(**kwargs)

Survival in patients with advanced lung cancer from the North Central Cancer Treatment Group. Performance scores rate how well the patient can perform usual daily activities.

Note

Be mindful of the status column, as lifelines will treat _any_ non-zero entry as censored.

::

Size: (288,10) Example:

inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
3.0 306 2 74 1 1.0 90.0 100.0 1175.0 NaN 3.0 455 2 68 1 0.0 90.0 90.0 1225.0 15.0 3.0 1010 1 56 1 0.0 90.0 90.0 NaN 15.0 5.0 210 2 57 1 1.0 90.0 60.0 1150.0 11.0 1.0 883 2 60 1 0.0 100.0 90.0 NaN 0.0

References

Loprinzi CL. Laurie JA. Wieand HS. Krook JE. Novotny PJ. Kugler JW. Bartel J. Law M. Bateman M. Klatt NE. et al. Prospective evaluation of prognostic variables from patient-completed questionnaires. North Central Cancer Treatment Group. Journal of Clinical Oncology. 12(3):601-7, 1994.

lifelines.datasets.load_lupus(**kwargs)

Note

I transcribed this from the original paper, and highly suspect there are differences. See Notes below.

References

Merrell, M., & Shulman, L. E. (1955). Determination of prognosis in chronic disease, illustrated by systemic lupus erythematosus. Journal of Chronic Diseases, 1(1), 12–32. doi:10.1016/0021-9681(55)90018-7

Notes

In lifelines v0.23.7, two rows were updated with more correct data (transcription problems originally.)

lifelines.datasets.load_lymph_node(**kwargs)

References

Schmoor, C., Sauerbrei, W. Bastert, G., Schumacher, M. (2000). Role of Isolated Locoregional Recurrence of Breast Cancer: Results of Four Prospective Studies. Journal of Clinical Oncology, 18(8), 1696-1708.

Schumacher, M., Bastert, G., Bojar, H., Hiibner, K., Olschewski, M., Sauerbrei, W., Schmoor, C., Beyerle, C., Neumann, R.L.A. and Rauschecker, H.F. for the German Breast Cancer Study Group (GBSG) (1994). A randomized 2 x 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. Journal of Clinical Oncology, 12, 2086-2093.

Hosmer, D.W. and Lemeshow, S. and May, S. (2008). Applied Survival Analysis: Regression Modeling of Time to Event Data: Second Edition, John Wiley and Sons Inc., New York, NY

lifelines.datasets.load_lymphoma(**kwargs)
Size: (80, 3)
Example:

Stage_group  Time  Censor
1     6       1
1    19       1
1    32       1
1    42       1
1    42       1


References

lifelines.datasets.load_mice(**kwargs)

A dataset of interval-censored observations of mice tumors in two different environments.

References

Hoel D. and Walburg, H.,(1972), Statistical analysis of survival experiments, The Annals of Statistics, 18, 1259-1294

lifelines.datasets.load_multicenter_aids_cohort_study(**kwargs)

Originally in [1]:

Siz: (78, 4)

AIDSY: date of AIDS diagnosis
W: years from AIDS diagnosis to study entry
T: years from AIDS diagnosis to minimum of death or censoring
D: indicator of death during follow up

i   AIDSY       W      T        D
1   1990.425    4.575   7.575   0
2   1991.250    3.750   6.750   0
3   1992.014    2.986   5.986   0
4   1992.030    2.970   5.970   0
5   1992.072    2.928   5.928   0
6   1992.220    2.780   4.688   1


References

[1] Cole SR, Hudgens MG. Survival analysis in infectious disease research: describing events in time. AIDS. 2010;24(16):2423-31.

lifelines.datasets.load_nh4(**kwargs)

Ammonium (NH4) concentration (mg/L) in precipitation measured at Olympic National Park, Hoh Ranger Station (WA14), weekly or every other week from January 6, 2009 through December 20, 2011.

Size: (104,3)

lifelines.datasets.load_panel_test(**kwargs)
Size: (28,5)
Example:
id  t  E  var1  var2
1  1  0   0.0     1
1  2  0   0.0     1
1  3  0   4.0     3
1  4  1   8.0     4
2  1  0   1.2     1

lifelines.datasets.load_psychiatric_patients(**kwargs)
Size: (26,4)
Example:
Age   T  C  sex
51   1  1    2
58   1  1    2
55   2  1    2
28  22  1    2
21  30  0    1

lifelines.datasets.load_recur(**kwargs)

From ftp://ftp.wiley.com/public/sci_tech_med/survival/, first published in “Applied Survival Analysis: Regression Modeling of Time to Event Data, Second Edition”:

ID          Subject Identification        1 - 400
AGE         Age                           years
TREAT       Treatment Assignment          0 = New
1 = Old
TIME0       Day of Previous Episode       Days
TIME1       Day of New Episode            Days
or censoring
CENSOR      Indicator for Soreness        1 = Episode Occurred
Episode or Censoring            at TIME1
0 = Censored
EVENT       Soreness Episode Number       0 to at most 4

Size: (1296, 7)
Example:
ID,AGE,TREAT,TIME0,TIME1,CENSOR,EVENT
1,43,0,9,56,1,3
1,43,0,56,88,1,4
1,43,0,0,6,1,1
1,43,0,6,9,1,2

lifelines.datasets.load_regression_dataset(**kwargs)

Artificial regression dataset. Useful since there are no ties in this dataset. Slightly edit in v0.15.0 to achieve this, however.:

Size: (200,5)
Example:
var1      var2      var3          T  E
0.595170  1.143472  1.571079  14.785479  1
0.209325  0.184677  0.356980   7.336734  1
0.693919  0.071893  0.557960   5.271527  1
0.443804  1.364646  0.374221  11.684168  1
1.613324  0.125566  1.921325   7.637764  1

lifelines.datasets.load_rossi(**kwargs)

This data set is originally from Rossi et al. (1980), and is used as an example in Allison (1995). The data pertain to 432 convicts who were released from Maryland state prisons in the 1970s and who were followed up for one year after release. Half the released convicts were assigned at random to an experimental treatment in which they were given financial aid; half did not receive aid.:

Size: (432,9)
Example:
week      20
arrest     1
fin        0
age       27
race       1
wexp       0
mar        0
paro       1
prio       3


References

Rossi, P.H., R.A. Berk, and K.J. Lenihan (1980). Money, Work, and Crime: Some Experimental Results. New York: Academic Press. John Fox, Marilia Sa Carvalho (2012). The RcmdrPlugin.survival Package: Extending the R Commander Interface to Survival Analysis. Journal of Statistical Software, 49(7), 1-32.

lifelines.datasets.load_stanford_heart_transplants(**kwargs)

This is a classic dataset for survival regression with time varying covariates. The original dataset is from [1], and this dataset is from R’s survival library.:

Size: (172, 8)
Example:
start  stop  event        age      year  surgery  transplant  id
0.0  50.0      1 -17.155373  0.123203        0           0   1
0.0   6.0      1   3.835729  0.254620        0           0   2
0.0   1.0      0   6.297057  0.265572        0           0   3
1.0  16.0      1   6.297057  0.265572        0           1   3
0.0  36.0      0  -7.737166  0.490075        0           0   4


References

[1] J Crowley and M Hu. Covariance analysis of heart transplant survival data. J American
Statistical Assoc, 72:27–36, 1977.
lifelines.datasets.load_static_test(**kwargs)
Size: (7,5)
Example:
id  t  E  var1  var2
1  4  1    -1    -1
2  3  1    -2    -2
3  3  0    -3    -3
4  4  1    -4    -4
5  2  1    -5    -5
6  0  1    -6    -6
7  2  1    -7    -7

lifelines.datasets.load_waltons(**kwargs)

Genotypes and number of days survived in Drosophila. Since we work with flies, we don’t need to worry about left-censoring. We know the birth date of all flies. We do have issues with accidentally killing some or if some escape. These would be right-censored as we do not actually observe their death due to “natural” causes.:

Size: (163,3)
Example:
T  E    group
6  1  miR-137
13  1  miR-137
13  1  miR-137
13  1  miR-137
19  1  miR-137