datasets¶
-
lifelines.datasets.
load_c_botulinum_lag_phase
(**kwargs)¶ A dataset from [1] that represents the duration of the lag phase for C. botulinum, measured in days, at 30C. The data is left and right censored. Note that the table does not have 6% NaCl, but the authors mention no growth occurred (we can infer lag time > 85D then)
References
Montville, THOMAS J. “Interaction of pH and NaCl on culture density of Clostridium botulinum 62A.” Appl. Environ. Microbiol. 46.4 (1983): 961-963.
-
lifelines.datasets.
load_canadian_senators
(**kwargs)¶ A history of Canadian senators in office.:
Size: (933,10) Example: Name Abbott, John Joseph Caldwell Political Affiliation at Appointment Liberal-Conservative Province / Territory Quebec Appointed on the advice of Macdonald, John Alexander Term (yyyy.mm.dd) 1887.05.12 - 1893.10.30 (Death) start_date 1887-05-12 00:00:00 end_date 1893-10-30 00:00:00 reason Death diff_days 2363 observed True
-
lifelines.datasets.
load_dd
(**kwargs)¶ Classification of political regimes as democracy and dictatorship. Classification of democracies as parliamentary, semi-presidential (mixed) and presidential. Classification of dictatorships as military, civilian and royal. Coverage: 202 countries, from 1946 or year of independence to 2008.:
Size: (1808, 12) Example: ctryname Afghanistan cowcode2 700 politycode 700 un_region_name Southern Asia un_continent_name Asia ehead Mohammad Zahir Shah leaderspellreg Mohammad Zahir Shah.Afghanistan.1946.1952.Mona... democracy Non-democracy regime Monarchy start_year 1946 duration 7 observed 1
References
Cheibub, José Antonio, Jennifer Gandhi, and James Raymond Vreeland. 2010. “Democracy and Dictatorship Revisited.” Public Choice, vol. 143, no. 2-1, pp. 67-101.
-
lifelines.datasets.
load_dfcv
()¶ A toy example of a time dependent dataset.
Size: (14, 6) Example: start group z stop id event 0 1.0 0 3.0 1 True 0 1.0 0 5.0 2 False 0 1.0 1 5.0 3 True 0 1.0 0 6.0 4 True
References
-
lifelines.datasets.
load_diabetes
(**kwargs)¶ An interval censored dataset.
References
Borch-Johnsens, K, Andersen, P and Decker, T (1985). “The effect of proteinuria on relative mortality in Type I (insulin-dependent) diabetes mellitus.” Diabetologia, 28, 590-596.
Size: (731, 3) Example: left right gender 24 27 male 22 22 female 37 39 male 20 20 male 1 16 male 8 20 female 14 14 male
-
lifelines.datasets.
load_g3
(**kwargs)¶ Size: (17,7) Example: no. 1 age 41 sex Female histology Grade3 group RIT event True time 53
-
lifelines.datasets.
load_gbsg2
(**kwargs)¶ A data frame containing the observations from the GBSG2 study of 686 women.:
Size: (686,10) Example: horTh yes age 56 menostat Post tsize 12 tgrade II pnodes 7 progrec 61 estrec 77 time 2018 cens 1
References
- Sauerbrei and P. Royston (1999). Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. Journal of the Royal Statistics Society Series A, Volume 162(1), 71–94
- Schumacher, G. Basert, H. Bojar, K. Huebner, M. Olschewski, W. Sauerbrei, C. Schmoor, C. Beyerle, R.L.A. Neumann and H.F. Rauschecker for the German Breast Cancer Study Group (1994), Randomized 2 × 2 trial evaluating hormonal treatment and the duration of chemotherapy in node- positive breast cancer patients. Journal of Clinical Oncology, 12, 2086–2093
-
lifelines.datasets.
load_holly_molly_polly
(**kwargs)¶ From https://stat.ethz.ch/education/semesters/ss2011/seminar/contents/presentation_10.pdf Used as a toy example for CoxPH in recurrent SA.:
ID Status Stratum Start(days) Stop(days) tx T 0 M 1 1 0 100 1 100 1 M 1 2 100 105 1 5 2 H 1 1 0 30 0 30 3 H 1 2 30 50 0 20 4 P 1 1 0 20 0 20
-
lifelines.datasets.
load_kidney_transplant
(**kwargs)¶ D.3 from Klein and Moeschberger Statistics for Biology and Health, 1997.
Size: (863,6) Example: time 5 death 0 age 51 black_male 0 white_male 1 black_female 0
-
lifelines.datasets.
load_larynx
(**kwargs)¶ Size: (89,6) Example: time age death Stage_II Stage_III Stage_IV 0.6 77 1 0 0 0 1.3 53 1 0 0 0 2.4 45 1 0 0 0 2.5 57 0 0 0 0 3.2 58 1 0 0 0
-
lifelines.datasets.
load_lcd
(**kwargs)¶ Copper concentrations (µg/L) in shallow groundwater samples from two different geological zones in the San Joaquin Valley, California. The alluvial fan data include four different detection limits and the basin trough data include five different detection limits.
Millard, S.P. and Deverel, S.J. (1988). Nonparametric statistical methods for comparing two sites based on data with multiple non-detect limits. Water Resources Research 24: doi: 10.1029/88WR03412. issn: 0043-1397.
Size: (104,3) Example: C T group 0 1 alluvial_fan 0 1 alluvial_fan 0 1 alluvial_fan 0 1 alluvial_fan 1 1 alluvial_fan
-
lifelines.datasets.
load_leukemia
(**kwargs)¶ Leukemia dataset.:
Size: (42,5) Example: t status sex logWBC Rx 0 35 0 1 1.45 0 1 34 0 1 1.47 0 2 32 0 1 2.20 0 3 32 0 1 2.53 0 4 25 0 1 1.78 0
References
From http://web1.sph.emory.edu/dkleinb/allDatasets/surv2datasets/anderson.dat
-
lifelines.datasets.
load_lung
(**kwargs)¶ Survival in patients with advanced lung cancer from the North Central Cancer Treatment Group. Performance scores rate how well the patient can perform usual daily activities.
- ::
Size: (288,10) Example:
- inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
- 3.0 306 1 74 1 1.0 90.0 100.0 1175.0 NaN 3.0 455 1 68 1 0.0 90.0 90.0 1225.0 15.0 3.0 1010 0 56 1 0.0 90.0 90.0 NaN 15.0 5.0 210 1 57 1 1.0 90.0 60.0 1150.0 11.0 1.0 883 1 60 1 0.0 100.0 90.0 NaN 0.0
References
Loprinzi CL. Laurie JA. Wieand HS. Krook JE. Novotny PJ. Kugler JW. Bartel J. Law M. Bateman M. Klatt NE. et al. Prospective evaluation of prognostic variables from patient-completed questionnaires. North Central Cancer Treatment Group. Journal of Clinical Oncology. 12(3):601-7, 1994.
-
lifelines.datasets.
load_lupus
(**kwargs)¶ See https://projecteuclid.org/download/pdf_1/euclid.aos/1176345693
Note
I transcribed this from the original paper, and highly suspect there are differences. See Notes below.
References
Merrell, M., & Shulman, L. E. (1955). Determination of prognosis in chronic disease, illustrated by systemic lupus erythematosus. Journal of Chronic Diseases, 1(1), 12–32. doi:10.1016/0021-9681(55)90018-7
Notes
In lifelines v0.23.7, two rows were updated with more correct data (transcription problems originally.)
-
lifelines.datasets.
load_lymph_node
(**kwargs)¶ References
Schmoor, C., Sauerbrei, W. Bastert, G., Schumacher, M. (2000). Role of Isolated Locoregional Recurrence of Breast Cancer: Results of Four Prospective Studies. Journal of Clinical Oncology, 18(8), 1696-1708.
Schumacher, M., Bastert, G., Bojar, H., Hiibner, K., Olschewski, M., Sauerbrei, W., Schmoor, C., Beyerle, C., Neumann, R.L.A. and Rauschecker, H.F. for the German Breast Cancer Study Group (GBSG) (1994). A randomized 2 x 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. Journal of Clinical Oncology, 12, 2086-2093.
Hosmer, D.W. and Lemeshow, S. and May, S. (2008). Applied Survival Analysis: Regression Modeling of Time to Event Data: Second Edition, John Wiley and Sons Inc., New York, NY
-
lifelines.datasets.
load_lymphoma
(**kwargs)¶ Size: (80, 3) Example: Stage_group Time Censor 1 6 1 1 19 1 1 32 1 1 42 1 1 42 1
References
From https://www.statsdirect.com/help/content/survival_analysis/logrank.htm
-
lifelines.datasets.
load_mice
(**kwargs)¶ A dataset of interval-censored observations of mice tumors in two different environments.
References
Hoel D. and Walburg, H.,(1972), Statistical analysis of survival experiments, The Annals of Statistics, 18, 1259-1294
-
lifelines.datasets.
load_multicenter_aids_cohort_study
(**kwargs)¶ Originally in [1]:
Siz: (78, 4) AIDSY: date of AIDS diagnosis W: years from AIDS diagnosis to study entry T: years from AIDS diagnosis to minimum of death or censoring D: indicator of death during follow up i AIDSY W T D 1 1990.425 4.575 7.575 0 2 1991.250 3.750 6.750 0 3 1992.014 2.986 5.986 0 4 1992.030 2.970 5.970 0 5 1992.072 2.928 5.928 0 6 1992.220 2.780 4.688 1
References
[1] Cole SR, Hudgens MG. Survival analysis in infectious disease research: describing events in time. AIDS. 2010;24(16):2423-31.
-
lifelines.datasets.
load_nh4
(**kwargs)¶ Ammonium (NH4) concentration (mg/L) in precipitation measured at Olympic National Park, Hoh Ranger Station (WA14), weekly or every other week from January 6, 2009 through December 20, 2011.
National Atmospheric Deposition Program, National Trends Network (NADP/NTN). http://nadp.slh.wisc.edu/data/sites/siteDetails.aspx?net=NTN&id=WA14 http://nadp.isws.illinois.edu/NTN/
Size: (104,3)
-
lifelines.datasets.
load_panel_test
(**kwargs)¶ Size: (28,5) Example: id t E var1 var2 1 1 0 0.0 1 1 2 0 0.0 1 1 3 0 4.0 3 1 4 1 8.0 4 2 1 0 1.2 1
-
lifelines.datasets.
load_psychiatric_patients
(**kwargs)¶ Size: (26,4) Example: Age T C sex 51 1 1 2 58 1 1 2 55 2 1 2 28 22 1 2 21 30 0 1
-
lifelines.datasets.
load_recur
(**kwargs)¶ From ftp://ftp.wiley.com/public/sci_tech_med/survival/, first published in “Applied Survival Analysis: Regression Modeling of Time to Event Data, Second Edition”:
ID Subject Identification 1 - 400 AGE Age years TREAT Treatment Assignment 0 = New 1 = Old TIME0 Day of Previous Episode Days TIME1 Day of New Episode Days or censoring CENSOR Indicator for Soreness 1 = Episode Occurred Episode or Censoring at TIME1 0 = Censored EVENT Soreness Episode Number 0 to at most 4 Size: (1296, 7) Example: ID,AGE,TREAT,TIME0,TIME1,CENSOR,EVENT 1,43,0,9,56,1,3 1,43,0,56,88,1,4 1,43,0,0,6,1,1 1,43,0,6,9,1,2
-
lifelines.datasets.
load_regression_dataset
(**kwargs)¶ Artificial regression dataset. Useful since there are no ties in this dataset. Slightly edit in v0.15.0 to achieve this, however.:
Size: (200,5) Example: var1 var2 var3 T E 0.595170 1.143472 1.571079 14.785479 1 0.209325 0.184677 0.356980 7.336734 1 0.693919 0.071893 0.557960 5.271527 1 0.443804 1.364646 0.374221 11.684168 1 1.613324 0.125566 1.921325 7.637764 1
-
lifelines.datasets.
load_rossi
(**kwargs)¶ This data set is originally from Rossi et al. (1980), and is used as an example in Allison (1995). The data pertain to 432 convicts who were released from Maryland state prisons in the 1970s and who were followed up for one year after release. Half the released convicts were assigned at random to an experimental treatment in which they were given financial aid; half did not receive aid.:
Size: (432,9) Example: week 20 arrest 1 fin 0 age 27 race 1 wexp 0 mar 0 paro 1 prio 3
References
Rossi, P.H., R.A. Berk, and K.J. Lenihan (1980). Money, Work, and Crime: Some Experimental Results. New York: Academic Press. John Fox, Marilia Sa Carvalho (2012). The RcmdrPlugin.survival Package: Extending the R Commander Interface to Survival Analysis. Journal of Statistical Software, 49(7), 1-32.
-
lifelines.datasets.
load_stanford_heart_transplants
(**kwargs)¶ This is a classic dataset for survival regression with time varying covariates. The original dataset is from [1], and this dataset is from R’s survival library.:
Size: (172, 8) Example: start stop event age year surgery transplant id 0.0 50.0 1 -17.155373 0.123203 0 0 1 0.0 6.0 1 3.835729 0.254620 0 0 2 0.0 1.0 0 6.297057 0.265572 0 0 3 1.0 16.0 1 6.297057 0.265572 0 1 3 0.0 36.0 0 -7.737166 0.490075 0 0 4
References
- [1] J Crowley and M Hu. Covariance analysis of heart transplant survival data. J American
- Statistical Assoc, 72:27–36, 1977.
-
lifelines.datasets.
load_static_test
(**kwargs)¶ Size: (7,5) Example: id t E var1 var2 1 4 1 -1 -1 2 3 1 -2 -2 3 3 0 -3 -3 4 4 1 -4 -4 5 2 1 -5 -5 6 0 1 -6 -6 7 2 1 -7 -7
-
lifelines.datasets.
load_waltons
(**kwargs)¶ Genotypes and number of days survived in Drosophila. Since we work with flies, we don’t need to worry about left-censoring. We know the birth date of all flies. We do have issues with accidentally killing some or if some escape. These would be right-censored as we do not actually observe their death due to “natural” causes.:
Size: (163,3) Example: T E group 6 1 miR-137 13 1 miR-137 13 1 miR-137 13 1 miR-137 19 1 miR-137