Telemedicine/EHR/Medical Informatics
Telemedicine/EHR/Medical Informatics 1
Jeffrey Yaeger, MD, MPH (he/him/his)
Associate Professor of Pediatrics and Public Health Sciences
Golisano Children's Hospital at The University of Rochester Medical Center
Rochester, New York, United States
Prediction models to detect invasive bacterial infections (IBIs, i.e., bacteremia, bacterial meningitis) rely on serum biomarkers which may not provide timely results and may be difficult to obtain in low-resource settings. To address this problem, we previously abstracted clinical variables to derive a machine learning model to detect IBIs without serum biomarkers. However, challenges to clinical adoption persist because clinicians must collect predictor variables and manually enter them into a web-based risk calculator.
Objective:
To address implementation barriers by deriving a natural language processing (NLP) algorithm to detect IBIs in febrile infants using free text from the electronic health record.
Design/Methods:
This is a cross-sectional pilot study of infants brought to one pediatric emergency department from January 2011-December 2018. Inclusion criteria were age 0-90 days, temperature >38°C, and documented gestational age. We abstracted all free text emergency department and admission notes written before laboratory results were available. We cleaned, lemmatized and vectorized unstructured free text notes using the term frequency-inverse document frequency method. We used unstructured free text notes with and without structured data (i.e. maximum temperature) to develop prediction models using logistic regression, support vector machine, and XGboost. To avoid overfitting, we performed 4-fold cross-validation to train and test the models. We calculated area-under-the-receiver operating characteristic curve (AUC), sensitivity, and specificity and used bootstrapping to estimate 90% confidence intervals. We performed a permutation test with 1000 samples to test the statistical significance of results. To qualitatively understand model performance, we also identified the free text terms most important to classification.
Results:
Of 1,421 febrile infants, 22 (1.5%) had an IBI. Median age was 54 days (IQR=36). The AUC was 0.77 (90% CI 0.57, 0.79). The XGboost model with free text plus maximum temperature outperformed other models, achieving a sensitivity of 1 (90% CI 0.67, 1) and specificity of 0.50 (90% CI 0.48, 0.78). The permutation test demonstrated statistically significant results (p< .001). “Cough” and “distress” were free text terms most strongly associated with IBIs.
Conclusion(s):
Findings in this pilot study suggest free text notes may be helpful to detect IBIs in febrile infants. If validated in larger, more heterogeneous samples, this NLP algorithm may reduce barriers to clinical adoption by enabling clinicians to estimate IBI risk in any setting.