Context.— Quality of reviewers is crucial to journal quality, but there are usually
too many for editors to know them all personally. A reliable method of rating
them (for education and monitoring) is needed.
Objective.— Whether editors' quality ratings of peer reviewers are reliable and
how they compare with other performance measures.
Design.— A 3.5-year prospective observational study.
Setting.— Peer-reviewed journal.
Participants.— All editors and peer reviewers who reviewed at least 3 manuscripts.
Main Outcome Measures.— Reviewer quality ratings, individual reviewer rate of recommendation
for acceptance, congruence between reviewer recommendation and editorial decision
(decision congruence), and accuracy in reporting flaws in a masked test manuscript.
Interventions.— Editors rated the quality of each review on a subjective 1 to 5 scale.
Results.— A total of 4161 reviews of 973 manuscripts by 395 reviewers were studied.
The within-reviewer intraclass correlation was 0.44 (P<.001),
indicating that 20% of the variance seen in the review ratings was attributable
to the reviewer. Intraclass correlations for editor and manuscript were only
0.24 and 0.12, respectively. Reviewer average quality ratings correlated poorly
with the rate of recommendation for acceptance (R=−0.34)
and congruence with editorial decision (R=0.26).
Among 124 reviewers of the fictitious manuscript, the mean quality rating
for each reviewer was modestly correlated with the number of flaws they reported
(R=0.53). Highly rated reviewers reported twice as
many flaws as poorly rated reviewers.
Conclusions.— Subjective editor ratings of individual reviewers were moderately reliable
and correlated with reviewer ability to report manuscript flaws. Individual
reviewer rate of recommendation for acceptance and decision congruence might
be thought to be markers of a discriminating (ie, high-quality) reviewer,
but these variables were poorly correlated with editors' ratings of review
quality or the reviewer's ability to detect flaws in a fictitious manuscript.
Therefore, they cannot be substituted for actual quality ratings by editors.