0
Editorial |

Measuring Physicians' Quality and Performance: Title and subTitle BreakAdrift on Lake Wobegon

Donald M. Berwick, MD, MPP
[+] Author Affiliations

Author Affiliation: Institute for Healthcare Improvement, Cambridge, Massachusetts.


JAMA. 2009;302(22):2485-2486. doi:10.1001/jama.2009.1801
Text Size: A A A
Published online

In Garrison Keillor's mythical hometown, Lake Wobegon, all the women are strong, all the men are good-looking, and all the children are above average. That is, of course, impossible, at least when it comes to the children. In any given population for any defined characteristic “everyone above average” is, statistically, nonsense.

Of course, the same is true of health care. Performance on anything called “good” about the care (such as reliability, waiting times, dignity, or survival) in any defined population (such as physicians, hospitals, visits, or health plans) will follow some distribution. The shape of that distribution can be orderly (such as quasi-normal, binomial, or Poisson) or disorderly, but for sure, every member cannot be above average.

That bare fact disturbs the peace, mainly because it invites comparison. For instance, patients who need heart surgery would want to know who can do the very best for them. That curiosity also may well stir fear, jealousy, and defensiveness among those who vie to care for those patients. Likewise, how can cardiac surgeons sleep well if they learn that their outcomes are “below average”? The more those surgeons care to do well, the worse their insomnia.

Disturbance of the peace of health care is now widespread. The evidence is overwhelming that variation in practices, outcomes, and costs of care is unconscionably large. The rapid maturation of data systems and reporting structures makes that variation more and more transparent. Especially with health care costs now threatening to smother the US economy, knowledge of variation is rapidly becoming a major tool for an entirely new level of accountability, mostly unwelcome.

Who is to be held accountable and for what? Hospitals are easy and appropriate targets. They have data systems, management and governance structures, and definable lines of work, and many have large enough volumes to support valid measurement. Indeed, hospitals in the United States already have to report many hundreds of variables to dozens of entities—so many of each that hospital leaders complain bitterly of the burden.

But what about physicians? No doubt physicians vary in their clinical skills and outcomes. This is especially interesting now because physicians traditionally have been thought to hold the keys to the health care treasury, that is, the physician's pen (or lately, keyboard), some argue, is at the root of health care profligacy and could be the strongest lever to control health care costs wisely.

The reasons for comparative measurement of performance in health care are many and depend on the actor. Contractors may want to choose among vendors. Patients may want to choose their site of care. Payers may want to attach financial incentives to good performance. And improvers may want to find out what problems they have, and who else may have solved them, so as to learn.

The edgiest use of comparative measurement is probably “pay for performance,”1 financial carrots and sticks attached to measured achievements in care. One of the clearest examples has been in England, where British general practitioners (GPs) engaged with the National Health Service beginning in 2004 under the Quality and Outcomes Framework (QOF).2 The terms of that contract in 2008-2009 define 129 quality indicators—the majority on clinical processes and outcomes—and award financial gains to physicians for following those standards. The budget for the first year of that program assumed that GPs would achieve 75% conformance to QOF standards. The budget was wrong; conformance was 97%.3 The National Health Service got exactly the performance it paid for, although the costs were higher than expected. Morale declined; GPs diverted attention from other needed care to the processes covered by the reward system; and some physicians clearly gamed the system by classifying high proportions of patients as exceptions.4

This and other early experiences with pay for performance at the level of individual physicians have raised some orange flags. In this issue of JAMA, the report by Nyweide et al5 raises another flag. It is known that individual physicians often have too few patients in any specific disease group to support statistically valid comparative measurement—a sample size problem. Nyweide et al5 ask whether statistically meaningful differences can be measured more reliably for primary care groups than for individuals. Using fee-for-service Medicare data, the authors constructed an algorithm to assign physicians to groups and patients to physicians, and then determined whether the numbers of patients suitable to study in each physician group were sufficient to support comparison among groups on 3 process measures of quality and 2 outcomes.

The answer, briefly summarized, is no. Less than 10% of physician groups with fewer than 11 physicians had Medicare sample sizes large enough to reliably reveal differences of 10% in any of the quality performance metrics studied. Even aggregating data over a 3-year period for each physician group failed to accumulate sufficient sample sizes for half of the groups with fewer than 6 physicians. No groups had enough patients to detect 10% differences in preventable hospitalizations or congestive heart failure readmissions.

This poses a dilemma. As illogical as it is to act as if all physicians were “above average,” there is almost no choice but to do so if there is no way to discern differences among them. That is true no matter the intended reason for discerning these differences: to choose, to reward, to punish, or (my favorite option) to learn.

There may be at least 4 potential routes out of Lake Wobegon, and they are not mutually exclusive.

First, the effective sample sizes contract rapidly when the focus is on specific diseases or patient subpopulations. By relying on highly focused quality metrics one at a time, Nyweide et al5 are viewing care through a tiny keyhole. If valid quality metrics could be constructed that cross conditions, more patients could contribute relevant data. That is, in essence, what the National Health Service QOF does. Nyweide et al are concerned that correlations among conditions for individual physicians may be low (ie, that a physician can be good at cardiac care but poor at orthopedics, and vice versa), threatening the validity of this approach. Even if true for the majority of physicians and groups (which seems unlikely), it is still a strong possibility that clinically important, underlying attributes of practice organization exist—such as reliability, safety, continuity, and efficiency—and could be defined operationally and measured, revealing meaningful differences from which all could learn. Research to define and validate cross-cutting and systemic measures of practice performance should proceed apace.

Second, more could be known if data could be aggregated from all payers, not just Medicare. Creating shared pools of transparent performance information for Medicare, Medicaid, and private insurers would be a step toward maturation in the ability to improve US health care.

Third, patients can and should be asked directly about their experiences of care. The uniform use of the Hospital Consumer Assessment of Healthcare Providers and Systems survey measures in Medicare goes in the right direction, but much more should be invested in listening to patients and their families, helping them to describe how well they feel treated. The correlations between such ratings and pure, technical care quality are modest, at best,6 but attributes of care like “patient-centeredness,” “timeliness,” and overall responsiveness, that patients can and do observe, are important qualities in their own right, and each physician's entire patient panel can contribute to sample size for these qualities.

And fourth, the ability to measure and track individual patients' health and function over time and place should be expanded. Measuring a mammography rate or the frequency of assessment of glycated hemoglobin is a far cry from measuring true aims: health, function, and comfort. Comparative assessment of patients' functional outcomes among individual physicians and groups may remain elusive as physicians, managers, policy makers, and payers move toward assessing what really counts, but at least they would be looking in the right places, not just in the convenient ones.

AUTHOR INFORMATION

Corresponding Author: Donald M. Berwick, MD, MPP, Institute for Healthcare Improvement, 20 University Rd, Seventh Floor, Cambridge, MA 02138 (dberwick1@ihi.org).

Financial Disclosures: None reported.

Editorials represent the opinions of the authors and JAMA and not those of the American Medical Association.

Rosenthal MB. Beyond pay for performance: emerging models of provider-payment reform.  N Engl J Med. 2008;359(12):1197-1200
PubMedCrossRef
 National Health Service Quality and Outcomes Framework Web site. http://www.qof.ic.nhs.uk/. Accessed November 12, 2009 
Doran T, Fullwood C, Gravelle H,  et al.  Pay-for-performance programs in family practices in the United Kingdom.  N Engl J Med. 2006;355(4):375-384
PubMedCrossRef
McDonald R, Roland M. Pay for performance in primary care in England and California: comparison of unintended consequences.  Ann Fam Med. 2009;7(2):121-127
PubMedCrossRef
Nyweide DJ, Weeks WB, Gottlieb DJ, Casalino LP, Fisher ES. Relationship of primary care physicians' patient caseload with measurement of quality and cost performance.  JAMA. 2009;302(22):2444-2450
CrossRef
Sequist TD, Schneider EC, Anastario M,  et al.  Quality monitoring of physicians: linking patients' experiences of care to clinical quality and outcomes.  J Gen Intern Med. 2008;23(11):1784-1790
PubMedCrossRef

First Page Preview

First page PDF preview

Figures

Tables

Interactive Graphics

Video

Country-Specific Mortality and Growth Failure in Infancy and Yound Children and Association With Material Stature

Use interactive graphics and maps to view and sort country-specific infant and early dhildhood mortality and growth failure data and their association with maternal

Rosenthal MB. Beyond pay for performance: emerging models of provider-payment reform.  N Engl J Med. 2008;359(12):1197-1200
PubMedCrossRef
 National Health Service Quality and Outcomes Framework Web site. http://www.qof.ic.nhs.uk/. Accessed November 12, 2009 
Doran T, Fullwood C, Gravelle H,  et al.  Pay-for-performance programs in family practices in the United Kingdom.  N Engl J Med. 2006;355(4):375-384
PubMedCrossRef
McDonald R, Roland M. Pay for performance in primary care in England and California: comparison of unintended consequences.  Ann Fam Med. 2009;7(2):121-127
PubMedCrossRef
Nyweide DJ, Weeks WB, Gottlieb DJ, Casalino LP, Fisher ES. Relationship of primary care physicians' patient caseload with measurement of quality and cost performance.  JAMA. 2009;302(22):2444-2450
CrossRef
Sequist TD, Schneider EC, Anastario M,  et al.  Quality monitoring of physicians: linking patients' experiences of care to clinical quality and outcomes.  J Gen Intern Med. 2008;23(11):1784-1790
PubMedCrossRef
CME Course for:


You need to register in order to view this quiz.


To understand the clinical management of acute heart failure syndromes.
Accreditation Information The American Medical Association is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians.
The AMA designates this journal-based CME activity for a maximum of 1 AMA PRA Category 1 CreditTM per course. Physicians should claim only the credit commensurate with the extent of their participation in the activity.
Physicians who complete the CME course and score at least 80% correct on the quiz are eligible for AMA PRA Category 1 CreditTM.
Note: You must get at least of the answers correct to pass this quiz.
Note: You must get at least of the answers correct to pass this quiz.
You have not filled in all the answers to complete this quiz
The following questions were not answered:
Sorry, you have unsuccessfully completed this CME quiz with a score of
The following questions were not answered correctly:
For CME Course: A Proposed Model for Initial Assessment and Management of Acute Heart Failure Syndromes
Indicate what changes(s) you will implement in your practice, if any, based on this CME course.
To view and print your certificate and access a summary of your CME courses go to My CME.
NOTE:
Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s “Cited By” API will populate this tab (http://www.crossref.org/citedby.html).
Submit a Comment

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Articles Related By Topic
Related Topics