Author Affiliation: Institute for Healthcare Improvement, Cambridge, Massachusetts.
In Garrison Keillor's mythical hometown, Lake Wobegon, all the women are strong, all the men are good-looking, and all the children are above average. That is, of course, impossible, at least when it comes to the children. In any given population for any defined characteristic “everyone above average” is, statistically, nonsense.
Of course, the same is true of health care. Performance on anything called “good” about the care (such as reliability, waiting times, dignity, or survival) in any defined population (such as physicians, hospitals, visits, or health plans) will follow some distribution. The shape of that distribution can be orderly (such as quasi-normal, binomial, or Poisson) or disorderly, but for sure, every member cannot be above average.
That bare fact disturbs the peace, mainly because it invites comparison. For instance, patients who need heart surgery would want to know who can do the very best for them. That curiosity also may well stir fear, jealousy, and defensiveness among those who vie to care for those patients. Likewise, how can cardiac surgeons sleep well if they learn that their outcomes are “below average”? The more those surgeons care to do well, the worse their insomnia.
Disturbance of the peace of health care is now widespread. The evidence is overwhelming that variation in practices, outcomes, and costs of care is unconscionably large. The rapid maturation of data systems and reporting structures makes that variation more and more transparent. Especially with health care costs now threatening to smother the US economy, knowledge of variation is rapidly becoming a major tool for an entirely new level of accountability, mostly unwelcome.
Who is to be held accountable and for what? Hospitals are easy and appropriate targets. They have data systems, management and governance structures, and definable lines of work, and many have large enough volumes to support valid measurement. Indeed, hospitals in the United States already have to report many hundreds of variables to dozens of entities—so many of each that hospital leaders complain bitterly of the burden.
But what about physicians? No doubt physicians vary in their clinical skills and outcomes. This is especially interesting now because physicians traditionally have been thought to hold the keys to the health care treasury, that is, the physician's pen (or lately, keyboard), some argue, is at the root of health care profligacy and could be the strongest lever to control health care costs wisely.
The reasons for comparative measurement of performance in health care are many and depend on the actor. Contractors may want to choose among vendors. Patients may want to choose their site of care. Payers may want to attach financial incentives to good performance. And improvers may want to find out what problems they have, and who else may have solved them, so as to learn.
The edgiest use of comparative measurement is probably “pay for performance,”1 financial carrots and sticks attached to measured achievements in care. One of the clearest examples has been in England, where British general practitioners (GPs) engaged with the National Health Service beginning in 2004 under the Quality and Outcomes Framework (QOF).2 The terms of that contract in 2008-2009 define 129 quality indicators—the majority on clinical processes and outcomes—and award financial gains to physicians for following those standards. The budget for the first year of that program assumed that GPs would achieve 75% conformance to QOF standards. The budget was wrong; conformance was 97%.3 The National Health Service got exactly the performance it paid for, although the costs were higher than expected. Morale declined; GPs diverted attention from other needed care to the processes covered by the reward system; and some physicians clearly gamed the system by classifying high proportions of patients as exceptions.4
This and other early experiences with pay for performance at the level of individual physicians have raised some orange flags. In this issue of JAMA, the report by Nyweide et al5 raises another flag. It is known that individual physicians often have too few patients in any specific disease group to support statistically valid comparative measurement—a sample size problem. Nyweide et al5 ask whether statistically meaningful differences can be measured more reliably for primary care groups than for individuals. Using fee-for-service Medicare data, the authors constructed an algorithm to assign physicians to groups and patients to physicians, and then determined whether the numbers of patients suitable to study in each physician group were sufficient to support comparison among groups on 3 process measures of quality and 2 outcomes.
The answer, briefly summarized, is no. Less than 10% of physician groups with fewer than 11 physicians had Medicare sample sizes large enough to reliably reveal differences of 10% in any of the quality performance metrics studied. Even aggregating data over a 3-year period for each physician group failed to accumulate sufficient sample sizes for half of the groups with fewer than 6 physicians. No groups had enough patients to detect 10% differences in preventable hospitalizations or congestive heart failure readmissions.
This poses a dilemma. As illogical as it is to act as if all physicians were “above average,” there is almost no choice but to do so if there is no way to discern differences among them. That is true no matter the intended reason for discerning these differences: to choose, to reward, to punish, or (my favorite option) to learn.
There may be at least 4 potential routes out of Lake Wobegon, and they are not mutually exclusive.
First, the effective sample sizes contract rapidly when the focus is on specific diseases or patient subpopulations. By relying on highly focused quality metrics one at a time, Nyweide et al5 are viewing care through a tiny keyhole. If valid quality metrics could be constructed that cross conditions, more patients could contribute relevant data. That is, in essence, what the National Health Service QOF does. Nyweide et al are concerned that correlations among conditions for individual physicians may be low (ie, that a physician can be good at cardiac care but poor at orthopedics, and vice versa), threatening the validity of this approach. Even if true for the majority of physicians and groups (which seems unlikely), it is still a strong possibility that clinically important, underlying attributes of practice organization exist—such as reliability, safety, continuity, and efficiency—and could be defined operationally and measured, revealing meaningful differences from which all could learn. Research to define and validate cross-cutting and systemic measures of practice performance should proceed apace.
Second, more could be known if data could be aggregated from all payers, not just Medicare. Creating shared pools of transparent performance information for Medicare, Medicaid, and private insurers would be a step toward maturation in the ability to improve US health care.
Third, patients can and should be asked directly about their experiences of care. The uniform use of the Hospital Consumer Assessment of Healthcare Providers and Systems survey measures in Medicare goes in the right direction, but much more should be invested in listening to patients and their families, helping them to describe how well they feel treated. The correlations between such ratings and pure, technical care quality are modest, at best,6 but attributes of care like “patient-centeredness,” “timeliness,” and overall responsiveness, that patients can and do observe, are important qualities in their own right, and each physician's entire patient panel can contribute to sample size for these qualities.
And fourth, the ability to measure and track individual patients' health and function over time and place should be expanded. Measuring a mammography rate or the frequency of assessment of glycated hemoglobin is a far cry from measuring true aims: health, function, and comfort. Comparative assessment of patients' functional outcomes among individual physicians and groups may remain elusive as physicians, managers, policy makers, and payers move toward assessing what really counts, but at least they would be looking in the right places, not just in the convenient ones.
Corresponding Author: Donald M. Berwick, MD, MPP, Institute for Healthcare Improvement, 20 University Rd, Seventh Floor, Cambridge, MA 02138 (dberwick1@ihi.org).
Financial Disclosures: None reported.
Editorials represent the opinions of the authors and JAMA and not those of the American Medical Association.
Country-Specific Mortality and Growth Failure in Infancy and Yound Children and Association With Material Stature
Use interactive graphics and maps to view and sort country-specific infant and early dhildhood mortality and growth failure data and their association with maternal
Instructions
Comments are moderated and will appear on the site at the discretion of the Journal of American Medical Association editors. Comments should not exceed 500 words of text and 10 references.
Do not submit personal medical questions or information that could identify a specific patient, questions about a particular case, or general inquiries to an author. Only content that has not been published, posted, or submitted elsewhere should be submitted. By submitting this Comment, you and any coauthors transfer copyright to the journal if your Comment is posted.
* = Required Field
Disclosure of Any Conflicts of Interest* Indicate all relevant conflicts of interest of each author below, including all relevant financial interests, activities, and relationships within the past 3 years including, but not limited to, employment, affiliation, grants or funding, consultancies, honoraria or payment, speakers’ bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued. If all authors have none, check "No potential conflicts or relevant financial interests" in the box below. Please also indicate any funding received in support of this work. The information will be posted with your response.
Register and get free email Table of Contents alerts, saved searches, PowerPoint downloads, CME quizzes, and more
Subscribe for full-text access to content from 1998 forward and a host of useful features
Activate your current subscription (AMA members and current subscribers)
Some tools below are only available to our subscribers or users with an online account.
Download citation file:
Customize your page view by dragging & repositioning the boxes below.
and access these and other features:
Register Now
Enter your username and email address. We'll send you a reminder to the email address on record.
Athens and Shibboleth are access management services that provide single sign-on to protected resources. They replace the multiple user names and passwords necessary to access subscription-based content with a single user name and password that can be entered once per session. It operates independently of a user's location or IP address. If your institution uses Athens or Shibboleth authentication, please contact your site administrator to receive your user name and password.