Ibrahim ZM, Wu H, Hamoud A, Stappen L, Dobson RJB, Agarossi A.

Journal of the American Medical Informatics Association, Pages 437–443

Objectives: Current machine learning models aiming to predict sepsis from electronic health records (EHR) do not account 20 for the heterogeneity of the condition despite its emerging importance in prognosis and treatment. This work demonstrates the added value of stratifying the types of organ dysfunction observed in patients who develop sepsis in the intensive care unit (ICU) in improving the ability to recognize patients at risk of sepsis from their EHR data.

Materials and Methods: Using an ICU dataset of 13 728 records, we identify clinically significant sepsis subpopulations with distinct organ dysfunction patterns. We perform classification experiments with random forest, gradient boost trees, and support vector machines, using the identified subpopulations to distinguish patients who develop sepsis in the ICU from those who do not.

Results: The classification results show that features selected using sepsis subpopulations as background knowledge yield a superior performance in distinguishing septic from non-septic patients regardless of the classification model used. The improved performance is especially pronounced in specificity, which is a current bottleneck in sepsis prediction machine learning models.

Conclusion: Our findings can steer machine learning efforts toward more personalised models for complex conditions including sepsis.