Public Health & Prevention
Public Health & Prevention 3
Catherine McDonough, M.S. (she/her/hers)
Data Analyst
Icahn School of Medicine at Mount Sinai
Staten Island, New York, United States
The prevalence of Type 2 diabetes (DM) and prediabetes (preDM) has been increasing among youth in recent decades, with an urgent need for screening and prevention efforts. Still, there is a lack of a comprehensive understanding of the epidemiological factors associated with, and an accurate screener, for these serious conditions.
Objective: Leveraging the rich information in the National Health and Nutrition Examination Survey (NHANES), we aimed to identify the most relevant factors and an effective screening method for classifying diabetes risk among youth aged 12-19 years.
Design/Methods: We extracted data on 95 variables potentially relevant to diabetes risk organized into 4 domains (socioeconomic status, health status, diet, and other lifestyle behaviors) from 9 NHANES survey cycles (1999-2016). We first conducted bivariate statistical analyses to identify significant (Bonferroni-adjusted p< 0.0005) variables individually associated with preDM/DM (fasting plasma glucose level ≥100 mg/dL and/or HbA1C ≥5.7%). We also used our Ensemble Integration (EI) framework for multi-domain machine learning to develop an effective youth preDM/DM screener and identify additional diabetes predictors.
Results: The bivariate analyses identified 19 significant correlates of preDM/DM, including sex, race/ethnicity, BMI, screen/sitting time, protein intake, health insurance and receipt of food stamps (Fig. 1). We also identified an EI methodology that predicted youth preDM/DM status (AUC=0.67, Balanced Accuracy (BA; (sensitivity+specificity)/2)=0.62) more accurately than current pediatric screening guidelines (AUC=0.57, BA=0.57; Wilcoxon rank-sum FDR=1.5x10-4 and 1.6x10-4, respectively), as well as EI applied to the four variable domains (AUC=0.63-0.55, BA=0.60-0.54; FDR< 1.5x10-4 and 1.6x10-4, respectively) (Fig. 2). Among the 20 most predictive variables identified using this EI methodology, 10 overlapped with those identified by the bivariate analyses (Fisher’s p of overlap=4.84x10-5). The other predictive variables included some known (e.g., meat and fruit intake and family income) and less recognized (e.g., number of rooms in the home and times healthcare was received in the past year) factors (Fig. 3).
Conclusion(s):
Using one of the largest datasets for youth preDM/DM, and complementary statistical and machine learning analyses, we identified important known and less known correlates of these disorders. We also developed a preDM/DM screener that performed significantly better than current clinical guidelines. These findings will help build an accurate and accessible youth preDM/DM screener for future deployment.