480 - Workplace-Based Clinical Assessments of Medical Students Reflect Assessors and Random Chance More Than Students

Friday, April 28, 2023

5:15 PM – 7:15 PM ET

Poster Number: 480
Publication Number: 480.123

Rachel A. Arnesen, Weill Cornell Medicine, Bethesda, MD, United States; Daniel Restifo, Weill Cornell Medicine, New York, New York, NY, United States; Thanakorn Jirasevijinda, Weill Cornell Medical College, New York, NY, United States; Adin Nelson, Weill Cornell Medicine, New York, NY, United States

Presenting Author(s)

Rachel A. Arnesen (she/her/hers)

Medical Student
Weill Cornell Medicine
Bethesda, Maryland, United States

Background: Medical students’ clinical performance is assessed both qualitatively (with narrative comments) and quantitatively (with numerical scores). Those scores play a large role in students’ final grades and residency applications, but previous studies have questioned whether clerkship clinical assessments truly reflect the student or the assessor.

Objective: 1) To explore sources of variability in quantitative student assessments including assessor characteristics, student demographics, and rotation site and sequence; 2) To test assessment reliability and determine the number of assessments required for a valid metric of student performance.

Design/Methods: We conducted a retrospective cross-sectional study of written assessments (SPEs) in the Pediatrics Clerkship at Weill Cornell from 2018 through 2021. The SPE is a standardized assessment form of twelve questions with behavior anchors on a 4-point scale. We used a linear mixed-effects model with student and evaluator characteristics as random effects to test whether the SPE scores correlated more with students or assessors. We then used a multivariable mixed-effects model to investigate the effects of various demographic and contextual factors. We also conducted a reliability analysis using the generalizability coefficient.

Results: We analyzed 2,958 SPEs submitted by 380 assessors (67% female; 36% attendings, 4% fellows, 60% residents) for 446 students (50% female). The median number of SPEs per student was 6. Interns and residents gave significantly higher SPE scores than attendings (p < 0.001), but there was no impact by gender, rotation site, rotation length, or time of year. We found a greater variance between assessors (0.090 points, SD 0.300) than between students (0.013 points, SD 0.115) and a large residual (0.080 points, SD 0.282), suggesting that assessor characteristics and random effects played a larger role in students’ scores than student characteristics. In order to reach 80% scoring reliability using the current assessment system, we would need a minimum of 24 SPEs per student.

Conclusion(s): We found that assessors’ characteristics and random chance contributed more variability to quantitative assessments of students’ clinical performance than student characteristics. Additionally, achieving 80% reliability with our current assessment system would require 4 times more assessments than students currently receive. Designing more-nuanced assessments with broader scales, improving training and guidelines for assessors, and soliciting more assessments per student may increase the validity of quantitative clerkship student clinical assessment.