0
Editorial |

Raising the Passing Grade for Studies of Medical Education

Stephen J. Lurie, MD, PhD
JAMA. 2003;290(9):1210-1212. doi:10.1001/jama.290.9.1210
Text Size: A A A
Published online

Physicians spend much of their time listening and responding to patients' concerns. Studies have found, however, that clinicians' interpersonal skills are not always as good as their patients might wish.1 - 2 In response, several medical organizations have called for improved training and competence in communication skills. The Association of American Medical Colleges, for instance, has included "communication in medicine" as a central aspect of its Medical Schools Outcomes Project, which is intended to guide curricula in all US medical schools.3 Beginning in 2004, the National Board of Medical Examiners will require all US medical students to travel to a testing center for an evaluation of their clinical skills, including communication.4 The Accreditation Council for Graduate Medical Education now requires all US residency programs to provide instruction in "interpersonal and communication skills."5 By the time this year's class of entering medical students will have completed their residencies, they may find that their interpersonal skills will be subject to lifelong examination. In a recent address to the American Board of Medical Specialties (ABMS), Baird6 stated that "an expanded assessment of interpersonal and communication skills would be a useful new endeavor for ABMS."

Given this broad consensus on the need to improve communications skills, the overall quality of the evidence for how to teach these and other aspects of professionalism is surprisingly poor.7 - 9 Much of this educational literature is not quantitative at all but, rather, comprises opinion pieces, anecdotal reports, and position articles. In their overview of professionalism curricula, Stephenson et al10 concluded that "Medical education is not short of excellent ideas about how to improve courses and create the professionals needed by society. What is in much shorter supply is evidence about the effectiveness of such teaching." Goldie11 described a similar lack of suitable studies to evaluate the outcomes of ethics curricula for medical students. Thus, despite a broad consensus on the need for high-quality studies, little evidence exists to guide educators in how to design the best possible programs, or how to evaluate and improve them.

There are many inherent difficulties in conducting high-quality educational research.9 Perhaps the most immediate of these is a lack of money and time. Although society has an interest in training physicians who can communicate effectively, such curricula have limited market potential and thus may not compete favorably for corporate support. Grant monies are also limited. Even assuming a more favorable funding environment, a range of methodological obstacles would remain. Educational interventions and outcome measures are expensive and time consuming, and the best of these are always personalized to specific environments and learners. The resulting small sample sizes limit generalizability and statistical power.

Another problem is the difficulty in measuring outcomes that relate to the intent of education. Although better communication would ideally lead to improved patient outcomes, such clinical variables are quite problematic as measures of clinicians' competence. In this issue of THE JOURNAL, Landon and colleagues12 describe the many limitations of clinical outcomes to assess physicians' quality of care. The difficulties of clinical outcomes to assess educational interventions would add another layer of confounding, including problems with randomization, blinding, and contamination of the intervention. Taken together, these challenges appear so vexing that Prideaux13 recently declared that randomized trials in education "are doomed to fail."

Difficult as these problems are, they do not entirely account for the field's incomplete accommodation to scientific methods. A more fundamental problem may be the intuitive and abstract nature of concepts such as "communications," which allows for any number of interpretations. Indeed, the most important educational constructs are probably the most difficult to define in a way that would permit valid quantification. Campbell and Johnson,14 for instance, asked, "How can the concept of multiprofessional learning become robust if we don't know what it means, cannot agree its goals [sic], and do not seem able to report the weaknesses and problems encountered and lessons learnt?" Other reviewers have remarked on the wide diversity of definitions, across medical schools, for similar educational constructs.15 - 16

In this issue of THE JOURNAL, Yedidia and colleagues17 report a trial of a communication curriculum at 3 medical schools. Their study demonstrates that much can be accomplished within these inherent limitations of educational research. Although Yedidia et al began with a general interest in improving students' communication skills, they used a telescoping strategy to move from 5 abstract communication tasks to a set of 21 rigorously standardized criteria. These measures were assessed with an objective structured clinical examination (OSCE), a tool that has come into general use for measuring communication skills and other complex clinical behaviors. The specific tasks for the 10 OSCE stations in the study were defined with input of individuals within each of the 3 institutions. The authors then went to great lengths to standardize the OSCEs at each of these geographically disparate sites. In contrast, most curricula do not have the resources available to Yedidia et al and, thus, generally develop OSCEs that cannot reliably be reproduced elsewhere.

The multicenter study by Yedidia et al sheds light on the limits of what the OSCE can be expected to accomplish as a standardized measure of communications skills. Although many well-designed OSCEs have been found to have acceptable test-retest or interobserver reliability,18 questions remain about both the content and construct validity of these examinations. The first of these criteria, content reliability, reflects the degree to which a test measures a representative sample of performance. Written tests, for instance, can probe a large range of knowledge in a rapid and economical way. Within their methodological limitations, they can produce a reasonably complete picture of examinees' abilities. To achieve a similar content validity, however, oral examinations may require substantially more than the 10 OSCE stations used in the study by Yedidia et al.19 Standardization of an even larger number of OSCEs across institutions might have been beyond the authors' financial and logistical resources. The costs of administering a fair, standardized, and universal OSCE might be prohibitive under almost any budget.

Construct validity is a more subtle concept and relates to the degree to which a test assesses some hypothesized underlying ability. The problem is that constructs such as "communications skills" are products of cultural and historical imagination. As such, they are inescapably imaginary. They do not lead an independent existence in any identifiable volume of space or expanse of time and are not waiting there to be discovered. Because "communications skills" as generally understood do not have any underlying physical dimensions, they are of an inherently different nature than any conceivable objective criterion.

Although an abstract construct like communication cannot be measured directly, its presence might nonetheless be inferred by the kinds of shadows it casts on various surfaces. To put it another way, measurement of constructs requires operationalization, which means that their measurement is inseparable from the operations by which they are measured. Thus, the standard psychometric approach to assessing a theoretical construct is to triangulate on it with several different methods, as originally described by Campbell and Fiske.20 Although such rigorous evaluation of OSCE methods has not been reported to my knowledge, a number of studies have found that OSCEs have only moderate to poor correlations with other methods of measuring competence, including faculty evaluations, in-service examinations, and standardized testing.21 - 24

From such data, it is tempting to conclude that OSCEs must measure some unique components of clinical skills that are not assessed by other methods.25 It is likely, however, that a portion of these low correlations are attributable to method variance, which is related to the common observation that some individuals do better at some kinds of tests than others, irrespective of apparent underlying ability. While the OSCE does have a certain amount of compelling face validity, its degree of construct validity remains to be determined. At present, all that can be said is that the intervention by Yedidia et al improved communications skills as measured by their particular OSCE. Indeed, given the logistic barriers to standardizing OSCEs across large groups of individuals, such overarching constructs may be forever fractured into countless such smaller operationalizations.

Such a fragmentation could be salutary to the extent that it would move the focus away from abstract discussions of theoretical educational concepts and toward a goal of all students demonstrating measurable competencies. Such a pragmatic approach need not be intellectually sterile. The study by Yedidia et al demonstrates that complex variables such as "communication skills" can be usefully conceived at several levels of specificity. Although all 3 schools committed to a number of basic curricular principles, they were each free to define their own particular timing and curricular content with them. Thus, students at the 3 schools may have had quite different experiences of the "intervention." While this lack of standardization would be a critical failing in a study of a drug or medical device, such variability reflects a realistic understanding of how such curricula are likely to evolve in different institutional cultures. Beyond this reflection of actual practice, the results can also provide valuable formative feedback to curriculum planners at each institution.

Given these caveats, the method used by Yedidia et al permitted a quantification of the effect of the intervention, both within and across schools. The authors found that, overall, the curriculum produced a 5% increase in communication skills. Could their massive intervention thereby be called "successful," an adjective that is often attached to purely narrative descriptions of new curricula? The faculty at these 3 schools certainly seem to have worked very hard to achieve this 5% improvement. Perhaps, in fact, this would be found to be an above-average outcome, were all such curricula assessed as rigorously. Nonetheless, the authors acknowledge that it is difficult to know how to judge this difference in absolute terms, given the lack of similar data for other medical-education interventions.

It would thus be especially helpful to develop common metrics and benchmarks for educational interventions, but this can happen only if future studies were to adopt a similar commitment to quantification. Such an approach has proven fruitful in other areas of clinical research. For instance, the effects of clinical trials can be compared on the common metric of number needed to treat, which allows for meaningful comparisons between the effects of unrelated interventions. Similarly, the results of cost-effectiveness analyses of unrelated interventions can be compared on the basis of quality-adjusted life-years. Such a common metric (perhaps the "number needed to teach") could allow curriculum planners to assess the potential benefits of one curriculum against another. With information about costs and accurate effect sizes, medical schools could also attempt to estimate the true costs of achieving their stated educational objectives.26

Surprisingly, Yedidia et al do not report whether students liked the intervention or how much they thought it improved their communication skills. Given that self-reported satisfaction is a common outcome variable in educational research,27 the lack of data on students' perceptions is certainly atypical of this literature. Self-reported satisfaction can be a useful adjunct to objective measurement of competence. This variable may also be of interest to the extent that students are conceived as consumers of educational services. Exclusive reliance on self-report data, however, is of dubious value in assessing the effect of a curriculum. Self-reported competence may have virtually no connection with objectively measured skills,28 and it has long been known that respondents do not have perfect access to their own internal mental processes in any event.29 Satisfaction data are also likely subject to publication bias, as it seems rare to read accounts of programs that participants did not like.

Overall, the study by Yedidia et al is an important step in setting a higher standard for the quality of research in medical education. The field already has a rich surplus of narrative descriptions and opinion pieces about these issues, but there is scant evidence that this publication enterprise has led to measurable improvements in the quality of medical instruction. Although the funding for high-quality studies remains a challenge, some authors16 have suggested that funders of education are increasingly demanding rigorous measures of effectiveness.

The public is intensely interested in improving the bedside manner of physicians, and that desire that has now been heard at several levels of organized medicine. Without better evidence for best teaching practices, however, all parties may continue to wonder why physicians cannot do better at understanding and responding to their patients' concerns.

REFERENCES

Marvel KM, Epstein RM, Flowers K, Beckman HB. Soliciting the patient's agenda: have we improved?  JAMA.1999;281:283-287.
Levinson W, Gorawara-Bhat R, Lamb J. A study of patient clues and physician responses in primary care and surgical settings.  JAMA.2000;284:1021-1027.
Association of American Medical Colleges.  Contemporary Issues in Medicine: Communications in Medicine. Report III. Medical School Objectives Project. October 1999. Available at: http://www.aamc.org/meded/msop/msop3.pdf. Accessed July 30, 2003.
United States Medical Licensing Examination Web site.  Clinical Skills Examination: frequently asked questions. Available at: http://www.usmle.org/news/cse/csefaqs2503.htm#when. Accessed July 30, 2003.
Accreditation Council for Graduate Medical Education Outcome Project.  General competencies: minimum program requirements language. Available at: http://www.acgme.org/outcome/comp/compMin.asp. Accessed July 30, 2003.
Baird MA. ABMS Educational Conference: Professional Competence and Board Certification. Available at: http://www.abms.org/downloads/conferences/baird%20paper.doc. Accessed July 30, 2003.
Green ML. Graduate medical education training in clinical epidemiology, critical appraisal, and evidence-based medicine.  Acad Med.1999;74:686-694.
Smits PB, Verbeck JH, de Buisonje CD. Problem-based learning in continuing medical education.  BMJ.2002;324:153-156.
Hatala R, Guyatt G. Evaluating the teaching of evidence-based medicine.  JAMA.2002;288:1110-1112.
Stephenson A, Higgs R, Sugarman J. Teaching professional development in medical schools.  Lancet.2001;357:867-870.
Goldie J. Review of ethics curricula in undergraduate medical education.  Med Educ.2000;34:108-119.
Landon BE, Normand ST, Blumenthal D, Daley J. Physician clinical performance assessment: prospects and barriers.  JAMA.2003;290:1183-1189.
Prideaux D. Researching the outcomes of educational interventions: a matter of design.  BMJ.2002;324:126-127.
Campbell JK, Johnson C. Trend spotting: fashions in medical education.  BMJ.1999;318:1272-1275.
Maudsley G. Do we all mean the same thing by "problem-based learning"?  Acad Med.1999;74:178-184.
Murray E, Gruppen L, Catton P, Hays R, Woolliscroft JO. The accountability of clinical education: its definition and assessment.  Med Educ.2000;34:871-879.
Yedidia MJ, Gillespie CC, Kachur E.  et al.  Effect of communications training on medical student performance.  JAMA.2003;290:1157-1165.
Wass V, McGibbon D, Van der Vleuten C. Composite undergraduate clinical examinations.  Med Educ.2001;35:326-330.
Swanson DB. A measurement framework for performance based tests. In: Hart IR, Harden RM, eds. Further Developments in Assessing Clinical Competence. Montreal, Quebec: Can-Heal; 1987:13-45.
Campbell DT, Fiske DW. Convergent and discriminant validation by the multitrait-multimethod matrix.  Psychol Bull.1959;56:81-105.
Skinner BD, Newton WP, Curtis P. The educational value of an OSCE in a family practice residency.  Acad Med.1997;72:722-724.
Sloan DA, Donnelly MB, Schwartz RW, Strodel WE. The objective structured clinical examination.  Ann Surg.1995;222:735-742.
Hilliard RI, Tallett SE. The use of an objective structured clinical examination with postgraduate residents in pediatrics.  Arch Pediatr Adolesc Med.1998;152:74-78.
Kahn MJ, Merrill WW, Anderson DS, Szerlip HM. Residency program director evaluations do not correlate with performance on a required 4th-year objective structured clinical examination.  Teach Learn Med.2001;13:9-12.
Dupras DM, Li JT. Use of an objective structured clinical examination to determine clinical competence.  Acad Med.1995;70:1029-1034.
Reinhardt UE. Academic medicine's financial accountability and responsibility.  JAMA.2000;284:1136-1138.
Prystowski JB, Bordage G. An outcomes research perspective on medical education.  Med Educ.2001;35:331-336.
Anastakis DJ, Wanzel KR, Brown MH.  et al.  Evaluating the effectiveness of a 2-year curriculum in a surgical skills center.  Am J Surg.2003;185:378-385.
Nisbett RE, Wilson TD. Telling more than we know: verbal reports on mental processes.  Psychol Rev.1977;84:231-259.

First Page Preview

First page PDF preview

Figures

Tables

Interactive Graphics

Video

Country-Specific Mortality and Growth Failure in Infancy and Yound Children and Association With Material Stature

Use interactive graphics and maps to view and sort country-specific infant and early dhildhood mortality and growth failure data and their association with maternal

Marvel KM, Epstein RM, Flowers K, Beckman HB. Soliciting the patient's agenda: have we improved?  JAMA.1999;281:283-287.
Levinson W, Gorawara-Bhat R, Lamb J. A study of patient clues and physician responses in primary care and surgical settings.  JAMA.2000;284:1021-1027.
Association of American Medical Colleges.  Contemporary Issues in Medicine: Communications in Medicine. Report III. Medical School Objectives Project. October 1999. Available at: http://www.aamc.org/meded/msop/msop3.pdf. Accessed July 30, 2003.
United States Medical Licensing Examination Web site.  Clinical Skills Examination: frequently asked questions. Available at: http://www.usmle.org/news/cse/csefaqs2503.htm#when. Accessed July 30, 2003.
Accreditation Council for Graduate Medical Education Outcome Project.  General competencies: minimum program requirements language. Available at: http://www.acgme.org/outcome/comp/compMin.asp. Accessed July 30, 2003.
Baird MA. ABMS Educational Conference: Professional Competence and Board Certification. Available at: http://www.abms.org/downloads/conferences/baird%20paper.doc. Accessed July 30, 2003.
Green ML. Graduate medical education training in clinical epidemiology, critical appraisal, and evidence-based medicine.  Acad Med.1999;74:686-694.
Smits PB, Verbeck JH, de Buisonje CD. Problem-based learning in continuing medical education.  BMJ.2002;324:153-156.
Hatala R, Guyatt G. Evaluating the teaching of evidence-based medicine.  JAMA.2002;288:1110-1112.
Stephenson A, Higgs R, Sugarman J. Teaching professional development in medical schools.  Lancet.2001;357:867-870.
Goldie J. Review of ethics curricula in undergraduate medical education.  Med Educ.2000;34:108-119.
Landon BE, Normand ST, Blumenthal D, Daley J. Physician clinical performance assessment: prospects and barriers.  JAMA.2003;290:1183-1189.
Prideaux D. Researching the outcomes of educational interventions: a matter of design.  BMJ.2002;324:126-127.
Campbell JK, Johnson C. Trend spotting: fashions in medical education.  BMJ.1999;318:1272-1275.
Maudsley G. Do we all mean the same thing by "problem-based learning"?  Acad Med.1999;74:178-184.
Murray E, Gruppen L, Catton P, Hays R, Woolliscroft JO. The accountability of clinical education: its definition and assessment.  Med Educ.2000;34:871-879.
Yedidia MJ, Gillespie CC, Kachur E.  et al.  Effect of communications training on medical student performance.  JAMA.2003;290:1157-1165.
Wass V, McGibbon D, Van der Vleuten C. Composite undergraduate clinical examinations.  Med Educ.2001;35:326-330.
Swanson DB. A measurement framework for performance based tests. In: Hart IR, Harden RM, eds. Further Developments in Assessing Clinical Competence. Montreal, Quebec: Can-Heal; 1987:13-45.
Campbell DT, Fiske DW. Convergent and discriminant validation by the multitrait-multimethod matrix.  Psychol Bull.1959;56:81-105.
Skinner BD, Newton WP, Curtis P. The educational value of an OSCE in a family practice residency.  Acad Med.1997;72:722-724.
Sloan DA, Donnelly MB, Schwartz RW, Strodel WE. The objective structured clinical examination.  Ann Surg.1995;222:735-742.
Hilliard RI, Tallett SE. The use of an objective structured clinical examination with postgraduate residents in pediatrics.  Arch Pediatr Adolesc Med.1998;152:74-78.
Kahn MJ, Merrill WW, Anderson DS, Szerlip HM. Residency program director evaluations do not correlate with performance on a required 4th-year objective structured clinical examination.  Teach Learn Med.2001;13:9-12.
Dupras DM, Li JT. Use of an objective structured clinical examination to determine clinical competence.  Acad Med.1995;70:1029-1034.
Reinhardt UE. Academic medicine's financial accountability and responsibility.  JAMA.2000;284:1136-1138.
Prystowski JB, Bordage G. An outcomes research perspective on medical education.  Med Educ.2001;35:331-336.
Anastakis DJ, Wanzel KR, Brown MH.  et al.  Evaluating the effectiveness of a 2-year curriculum in a surgical skills center.  Am J Surg.2003;185:378-385.
Nisbett RE, Wilson TD. Telling more than we know: verbal reports on mental processes.  Psychol Rev.1977;84:231-259.
CME Course for:


You need to register in order to view this quiz.


To understand the clinical management of acute heart failure syndromes.
Accreditation Information The American Medical Association is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians.
The AMA designates this journal-based CME activity for a maximum of 1 AMA PRA Category 1 CreditTM per course. Physicians should claim only the credit commensurate with the extent of their participation in the activity.
Physicians who complete the CME course and score at least 80% correct on the quiz are eligible for AMA PRA Category 1 CreditTM.
Note: You must get at least of the answers correct to pass this quiz.
Note: You must get at least of the answers correct to pass this quiz.
You have not filled in all the answers to complete this quiz
The following questions were not answered:
Sorry, you have unsuccessfully completed this CME quiz with a score of
The following questions were not answered correctly:
For CME Course: A Proposed Model for Initial Assessment and Management of Acute Heart Failure Syndromes
Indicate what changes(s) you will implement in your practice, if any, based on this CME course.
To view and print your certificate and access a summary of your CME courses go to My CME.
NOTE:
Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s “Cited By” API will populate this tab (http://www.crossref.org/citedby.html).
Submit a Response

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Articles Related By Topic
Related Topics
PubMed Articles