Competency-based models of education require assessments to be based on individuals' capacity to perform, yet the nature of human judgment may fundamentally limit the extent to which such assessment is accurately possible.
To determine whether recent observations of the Mini Clinical Evaluation Exercise (Mini-CEX) performance of postgraduate year 1 physicians influence raters' scores of subsequent performances, consistent with either anchoring bias (scores biased similar to previous experience) or contrast bias (scores biased away from previous experience).
Design, Setting, and Participants
Internet-based randomized, blinded experiment using videos of Mini-CEX assessments of postgraduate year 1 trainees interviewing new internal medicine patients. Participants were 41 attending physicians from England and Wales experienced with the Mini-CEX, with 20 watching and scoring 3 good trainee performances and 21 watching and scoring 3 poor performances. All then watched and scored the same 3 borderline video performances. The study was completed between July and November 2011.
Main Outcome Measures
The primary outcome was scores assigned to the borderline videos, using a 6-point Likert scale (anchors included: 1, well below expectations; 3, borderline; 6, well above expectations). Associations were tested in a multivariable analysis that included participants' sex, years of practice, and the stringency index (within-group z score of initial 3 ratings).
The mean rating scores assigned by physicians who viewed borderline video performances following exposure to good performances was 2.7 (95% CI, 2.4-3.0) vs 3.4 (95% CI, 3.1-3.7) following exposure to poor performances (difference of 0.67 [95% CI, 0.28-1.07]; P = .001). Borderline videos were categorized as consistent with failing scores in 33 of 60 assessments (55%) in those exposed to good performances and in 15 of 63 assessments (24%) in those exposed to poor performances (P < .001). They were categorized as consistent with passing scores in 5 of 60 assessments (8.3%) in those exposed to good performances compared with 25 of 63 assessments (39.5%) in those exposed to poor performances (P < .001). Sex and years of attending practice were not associated with scores. The priming condition (good vs poor performances) and the stringency index jointly accounted for 45% of the observed variation in raters' scores for the borderline videos (P < .001).
In an experimental setting, attending physicians exposed to videos of good medical trainee performances rated subsequent borderline performances lower than those who had been exposed to poor performances, consistent with a contrast bias.