730 - Developing Generalized Predictive Models of Various Conditions using the Pediatric MIMIC-3 Database
Friday, April 28, 2023
5:15 PM – 7:15 PM ET
Poster Number: 730 Publication Number: 730.154
Erik G. Alvstad, CHOC Children's Hospital of Orange County, Tustin, CA, United States; Laya S. Pullela, MI4 CHOC Children's Hospital of Orange County, Irvine, CA, United States; Howard Lei, CHOC Children's, Los Angeles, CA, United States; Anthony Chang, CHOC Children's Hospital of Orange County, Orange, CA, United States
Undergraduate MI4 CHOC Children's Hospital of Orange County Irvine, California, United States
Background: Developing an accurate diagnosis from patient histories and medical imaging is a task wherein intelligent computational systems have found success. With models heavily utilizing lab tests and scans as input data, non-invasive features, which are more accessible in terms of resources and expenses, may be underrated in terms of prediction capacity, even if they are not necessarily identifying factors for a particular medical condition. Objective: The goal of this research was to build a baseline distribution model for conditions that appear to have a high correlation to patterns of non-invasive features. The purpose of these models was not to diagnose patients using non-invasive features, but rather to guide doctors in searching for flags and indicators that enable the formal diagnosis of certain conditions, some of which could easily be overlooked without such a tool. Design/Methods: The Pediatric Intensive Care database from Physionet was used. Diagnoses with over 50 positive cases were analyzed. A hyperparameter search was performed for every single individual diagnosis for a single-layered perceptron classifier, logistic regression, and decision tree classifier, using non-invasive features. Finally, results were analyzed in four ways: AUC-ROC, Mann-Whitney U-test, Welch’s T-Test, and Fisher exact, in order to ascertain how correlated non-invasive feature data was to a particular positive diagnosis. Results: Out of over 200 conditions in the PIC dataset, about 13 were determined to have strong statistical correlation to the EHR non-invasive data alone. The AUC values for these diagnoses were greater than 0.70, with sensitivity and specificity scores above 0.70 (with few exceptions). The additional three tests described above resulted in p-values < 0.01.
Conclusion(s): This study successfully isolated 13 conditions by which non-invasive feature data provided some predictive capacity towards a diagnosis. On average, sensitivity scores were slightly higher than specificity scores; this was desired as Type I error was preferred over Type II. Although the ROC values for many conditions were high, this metric should not indicate that non-invasive feature data provides enough information to ascertain diagnoses. Instead, this model serves as a flag towards suspicious EHR/vital sign data for further investigation, in order to guide physicians in testing for various conditions. Embedded within EHRs, such a model could potentially alleviate the underdiagnosis of conditions with non-flagrant symptoms.