Asif Rahman

Taxonomy of health data for machine learning

There is a wide variety of data types collected in the health system that can be utilized by machine learning models. These can include

  1. patient-level information like demographics and socio-economic factors
  2. hospital encounter-level information like admission source, ICU unit type, and discharge location
  3. outcomes including diagnoses like billing codes and patient outcomes
  4. interventions a patient received in the hospital like medications, invasive mechanical ventilation, oxygen support, pressors, fluids, blood transfusions, and ECMO
  5. findings from radiological images, pathology images, and video recordings
  6. laboratory measurements like blood gases, metabolic panels, liver panels, lipid panels, complete blood count, urinalysis, urine output, microbiology, and omics data
  7. continuous waveforms like ECG, PPG, PCG, ABP, and etCO2 signals
  8. nurse charted or automated vital sign collection including temperature, heart rate, blood pressure, and oxygen saturation
  9. clinican and radiological notes

In-patient Data collected in the hospital is linked to patients using a unique medical record number (MRN). Data collected in out-patient settings, including at home, a nursing home, or in ambulatory care may not also be linked to the patients MRN. Even inside the hospital, linking waveforms (especially in time) with patient data in electronic health records is a significant challenge.