0
We're unable to sign you in at this time. Please try again in a few minutes.
Retry
We were able to sign you in, but your subscription(s) could not be found. Please try again in a few minutes.
Retry
There may be a problem with your account. Please contact the AMA Service Center to resolve this issue.
Contact the AMA Service Center:
Telephone: 1 (800) 262-2350 or 1 (312) 670-7827  *   Email: subscriptions@jamanetwork.com
Error Message ......
Review |

Comparison of Evidence of Treatment Effects in Randomized and Nonrandomized Studies FREE

John P. A. Ioannidis, MD; Anna-Bettina Haidich, MSc; Maroudia Pappa, MSc; Nikos Pantazis, MSc; Styliani I. Kokori, MD; Maria G. Tektonidou, MD; Despina G. Contopoulos-Ioannidis, MD; Joseph Lau, MD
[+] Author Affiliations

Author Affiliations: Clinical Trials and Evidence-Based Medicine Unit, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina (Drs Ioannidis and Contopoulos-Ioannidis, and Ms Haidich), Department of Hygiene and Epidemiology, University of Athens School of Medicine (Ms Pappa and Mr Pantazis) and Laikon General Hospital (Drs Kokori and Tektonidou), Athens, Greece; Department of Pediatrics, George Washington University School of Medicine, Washington, DC (Dr Contopoulos-Ioannidis); and Division of Clinical Care Research, Department of Medicine, Tufts University School of Medicine, Boston, Mass (Drs Ioannidis and Lau).


JAMA. 2001;286(7):821-830. doi:10.1001/jama.286.7.821.
Text Size: A A A
Published online

Context There is substantial debate about whether the results of nonrandomized studies are consistent with the results of randomized controlled trials on the same topic.

Objectives To compare results of randomized and nonrandomized studies that evaluated medical interventions and to examine characteristics that may explain discrepancies between randomized and nonrandomized studies.

Data Sources MEDLINE (1966–March 2000), the Cochrane Library (Issue 3, 2000), and major journals were searched.

Study Selection Forty-five diverse topics were identified for which both randomized trials (n = 240) and nonrandomized studies (n = 168) had been performed and had been considered in meta-analyses of binary outcomes.

Data Extraction Data on events per patient in each study arm and design and characteristics of each study considered in each meta-analysis were extracted and synthesized separately for randomized and nonrandomized studies.

Data Synthesis Very good correlation was observed between the summary odds ratios of randomized and nonrandomized studies (r = 0.75; P<.001); however, nonrandomized studies tended to show larger treatment effects (28 vs 11; P = .009). Between-study heterogeneity was frequent among randomized trials alone (23%) and very frequent among nonrandomized studies alone (41%). The summary results of the 2 types of designs differed beyond chance in 7 cases (16%). Discrepancies beyond chance were less common when only prospective studies were considered (8%). Occasional differences in sample size and timing of publication were also noted between discrepant randomized and nonrandomized studies. In 28 cases (62%), the natural logarithm of the odds ratio differed by at least 50%, and in 15 cases (33%), the odds ratio varied at least 2-fold between nonrandomized studies and randomized trials.

Conclusions Despite good correlation between randomized trials and nonrandomized studies—in particular, prospective studies—discrepancies beyond chance do occur and differences in estimated magnitude of treatment effect are very common.

Figures in this Article

Randomized controlled trials have often been considered as the reference standard for evaluating the efficacy of therapeutic and preventive interventions.1 However, for many medical questions of interest, a large amount of evidence is often accumulated through nonrandomized studies. There has been substantial controversy about whether the results of nonrandomized studies agree with the results of randomized trials. Earlier evaluations suggested that nonrandomized studies may spuriously overestimate treatment benefits yielding misleading conclusions.25

Recently, the debate has been renewed.68 Much of the debate has been conducted on theoretical grounds about the biases that may affect each type of study design with an emphasis on the fact that nonrandomized studies may be more susceptible to unaccounted confounding. However, empirical evidence has also been accumulating. On the one hand, specific examples have arisen in the recent literature in which randomized studies have found different results compared with the epidemiologic literature that preceded them. Such examples included hormone replacement therapy and the risk of coronary artery disease; beta carotene and alpha tocopherol and their impact on coronary mortality; and the relationship between dietary fiber and colon cancer.912 On the other hand, recent evaluations have suggested that for selected medical topics, both randomized and nonrandomized studies may yield very similar results.7,8,13

There is a need to address these issues using empirical data from a large number of diverse medical topics. Using such data, one would like to answer the following questions: How do the results of randomized trials and nonrandomized studies compare when both are performed for the same question? Do nonrandomized studies tend to give more favorable results than randomized trials? Finally, are there design or other characteristics that may explain the discrepancies between randomized trials and nonrandomized studies? To address these issues, we performed a systematic evaluation using data from a large number of medical questions about which the efficacy of therapeutic or preventive interventions had been assessed with both randomized trials and nonrandomized studies.

Search for Meta-analyses and Selection of Topics and Outcomes

We identified meta-analyses that had considered both randomized and nonrandomized evidence. The pertinent subjects and meta-analyses were identified using 5 different complementary approaches to maximize the yield of topics and to ensure that a wide variety of topics was retrieved. First, we reviewed the previous literature on comparisons of randomized and nonrandomized studies until mid-1998,26 and we screened all the examples of such comparisons that the articles cited. Second, we perused our personal database of meta-analyses published between 1991 and 1997 in JAMA, Lancet, BMJ, Annals of Internal Medicine, and Archives of Internal Medicine. Third, we searched MEDLINE (last search updated on March 2000) for articles categorized as meta-analyses (type of publication) that contained a combination of at least 1 Medical Subject Heading suggestive of randomized clinical trials (such as randomized controlled trials, randomized clinical trials) and 1 Medical Subject Heading suggestive of a nonrandomized design (such as prospective cohorts, retrospective cohorts, case-control studies, etc). Fourth, we screened all the completed systematic reviews of the Cochrane Library (last screen on issue 3, 2000, containing 859 reviews). Fifth, we used meta-analyses that had been performed by investigators in our group with both randomized and nonrandomized comparisons included.

From all these sources, we selected the meta-analyses in which both randomized and nonrandomized studies were cited with at least 1 primary outcome being in binary form. Data on the binary outcome had to be presented in the meta-analysis. For the meta-analyses identified by perusing our personal database of meta-analyses and MEDLINE, we also considered meta-analyses in which some of the binary data might be unreported but might still be retrievable by reviewing the primary articles of each study cited by the meta-analysis. An effort was made to identify all the primary study articles whenever either the primary binary outcome information or important study characteristics were not reported in the meta-analysis. A few primary studies that could not be retrieved (primarily abstracts from conferences and very old studies) had to be excluded whenever the binary outcome data were not available in the published meta-analysis. For final inclusion of a topic in our evaluation, binary data for the same outcome had to be available on at least 1 randomized trial and at least 1 nonrandomized study.

Whenever a meta-analysis used different binary outcomes/end points and several of them had available data both for randomized and nonrandomized studies, we selected a priori the primary outcome, as stated by the meta-analysis. Whenever it was not clear which was the primary outcome, we selected a priori the outcome that was most important clinically, using consensus among the data extractors. Generally, mortality had a priority over other hard clinical outcomes, soft clinical outcomes, and laboratory outcomes, provided that there were at least some events for the most severe clinical outcome so that calculations of effects would be meaningful.

In some meta-analyses, comparisons of different interventions against each other or against no intervention or placebo had been considered. In this case, each eligible comparison qualified as a separate topic.

Data Extraction for Primary Studies

For each primary study in a meta-analysis we extracted the following information: type of design, year of publication, events per patients in each arm for the outcome of interest, age of the population (adult, children, or mixed), and duration of follow-up (in months, when available [or at least whether it was more than or up to 1 year]). We did not update systematically the eligible meta-analyses to include additional studies published after the meta-analysis. However, we tried to ensure that the comparison of the summary treatment effects between randomized and nonrandomized studies would not be totally offset from missing or recent information. Thus, we screened all the identified topics for missing information (eg, abstracts without binary data); adjusted odds ratio estimates in individual patient data meta-analysis that could not be accounted in our analyses; and major widely known recent trials that might offset the comparison of the magnitude of effects.

Nonrandomized Study Designs

Nonrandomized designs were categorized into prospective nonrandomized studies (all subjects were recruited and evaluated prospectively, but the control arm had not been created through randomization); retrospective cohort studies (subjects were evaluated retrospectively and the study arms were concurrent [without matching]); case-control studies (studies in which the compared groups were defined on the basis of the outcome and/or matching was used); historical control studies (studies with retrospective, nonconcurrent controls); and other or not-specified design. In each topic, we limited the analyses to the study designs that were included or systematically cited through the original meta-analysis.

Statistical Analysis

For each topic we combined the data from randomized and nonrandomized studies separately. We used the odds ratio (OR) as the metric of choice since case-control studies would also be included; moreover, the OR has statistical advantages.14 We used both random-effects (DerSimonian and Laird15) and fixed-effects (Mantel-Haenszel)16 calculations. Random effects models are reported, unless stated otherwise, because they incorporate an estimate of the between-study variance in the calculations and they tend to give wider (more conservative) confidence intervals than fixed effects.17 Fixed-effects calculations are also given when substantially different. Heterogeneity between the studies of each type of design was assessed using the Q statistic and was considered significant for P<.10.17

To evaluate the concordance between the results of randomized and nonrandomized studies, we performed the following analyses: (1) We evaluated the Spearman correlation coefficient for the summary OR estimates between randomized and nonrandomized studies; (2) We assessed in how many cases the summary OR of the nonrandomized studies suggested a larger treatment effect for the experimental intervention than the summary OR of the randomized trials; (3) We evaluated whether the difference in the ORs of randomized and nonrandomized studies for the same topic was larger than what would be anticipated by chance alone. To do this, we estimated the z score, as follows:

zRCTNRSRCTNRS1/2

where ln(ORRCT) is the natural logarithm of the OR of randomized trials, ln(ORNRS) is the natural logarithm of the OR of nonrandomized studies, and var stands for variance. A z score above 1.96 or less than −1.96 suggests that the difference between the randomized trials and nonrandomized studies is beyond chance (.05 level of statistical significance).18 We also used alternative rules to define discrepancies based on differences in the relative magnitude of the treatment effect: (1) the OR of nonrandomized studies being at least double or less than half of the OR of randomized trials, and (2) the natural logarithm of the OR of nonrandomized studies being at least 50% larger or smaller than the natural logarithm of the OR of randomized trials. The magnitude of the treatment effect is important because it shows how much a treatment works.

Discrepancy rates were estimated for comparisons of randomized trials against all nonrandomized studies; all studies, excluding historical controls; prospective studies; retrospective studies with concurrent controls; and historical control studies only. We also performed analyses limited to studies published in 1986 or later. Furthermore, we evaluated whether the odds of a discrepancy beyond chance depended on the average year of publication in the studies included in each meta-analysis. Finally, study and topic characteristics were scrutinized to see whether there is an explanation for the statistically significant discrepancies. In this regard, we evaluated whether randomized trials and nonrandomized studies differed in years of publication, length of follow-up (less or more than 1 year), age of population (children, adults, elderly people), sample size, or other protocol characteristics.

Analyses were conducted in SPSS 10.0 (SPSS Inc, Chicago, Ill) and in Meta-Analyst (J. L., Boston, Mass). All P values are 2-tailed.

Characteristics of Medical Topics

A total of 45 topics were identified in which both randomized and nonrandomized studies had been performed on the same topic (Table 1a).2,3,1952 Among 408 primary studies with available binary data, there were 240 randomized trials and 168 nonrandomized studies. The latter group included 71 prospective nonrandomized studies, 40 retrospective cohort studies, 25 case-control studies, 29 studies with historical controls, 1 cohort study with individual patient data assembled from several centers (unclear if prospective or retrospective), and 2 studies without clear design (presumably retrospective). The topics covered a wide range of medical specialties. In 29 topics there were more randomized trials than nonrandomized studies. In 26 topics there were more patients in randomized trials than in nonrandomized studies.

Table Graphic Jump LocationTable 1a. Topics of Meta-analyses Considering Both Randomized Trials and Nonrandomized Studies*
Estimates of Treatment Effects and Between-Study Heterogeneity

Figure 1 shows side by side the summary ORs for randomized trials and nonrandomized studies in each topic. In all, statistically significant heterogeneity was seen between randomized trials in 9 of 39 topics for which at least 2 randomized trials (23%) had been included. Statistically significant heterogeneity was seen between nonrandomized studies in 13 of 32 topics for which at least 2 nonrandomized studies (41%) had been included. The respective figure was 6 (40%) of 15 topics, when limited to prospective nonrandomized studies. The between-study variance was smaller among randomized trials than among nonrandomized studies in 18 topics while the opposite occurred in 6 cases, and it was the same in both designs in 4 cases (exact P = .07 by Wilcoxon test). The between-study variance was smaller among randomized trials than among prospective nonrandomized studies in 10 topics while the opposite occurred in 1 case, and it was the same in both designs in 3 cases (exact P = .03 by Wilcoxon test).

Figure 1. Comparison of the Summary Odds Ratio and 95% Confidence Interval in Randomized Trials vs Nonrandomized Studies for the 45 Topics
Graphic Jump Location
The topic numbers correspond to the identification numbers in Table 1. Calculations have been performed with random effects in the panel A and, for comparison, by fixed effects in the panel B. The topics have been ordered according to increasing odds ratio estimates in randomized trials using random-effects calculations. Data shown in blue indicate the topics in which the difference between randomized trials and nonrandomized studies was beyond what would be expected by chance alone. For 1 topic (No. 11), both of the summary estimates lie outside the depicted range.
Correlation and Comparison of Treatment Effects

The correlation coefficient between the treatment effect in randomized trials and in nonrandomized studies was 0.75 (P<.001). This became 0.83 (P<.001) when historical control studies were excluded (Figure 2).

Figure 2. Comparison of the Summary Odds Ratio in Randomized Trials vs Nonrandomized Studies
Graphic Jump Location
Historically controlled studies are excluded from the calculations. Calculations are performed with random effects. Odds ratios are shown in a natural logarithmic scale. Not shown is 1 topic with very large summary odds ratios (>25) for both types of designs.

In 25 of the 45 cases, the nonrandomized studies showed a larger treatment effect for the experimental treatment than the randomized studies. The opposite occurred in 14 cases, but it was probably due to data artifacts in 3 of these: in 1 case, aspirin had shown a larger preventive effect for pregnancy-induced hypertension in randomized trials than in nonrandomized studies in an early meta-analysis,40 but a major recent trial has shown no effect at all.53 In another case, BCG immunotherapy for melanoma, the 1 published randomized trial had more favorable data than the nonrandomized studies, but several other randomized studies with less favorable results were not included (only available in abstract form without binary data).3 A meta-analysis of allogeneic leukocyte immunotherapy showed more favorable results in randomized studies, but this was not true in the main original analysis, which was based on individual patient data with adjustment for significant predictors.21 Finally, in 6 topics, it was not possible to identify clearly which study design produced more favorable results: in 2 cases (high-dose diuretics for hypertension and antiarrhythmic therapy for chronic atrial fibrillation) different conclusions were reached with fixed- and random-effects calculations; in another topic (hormonal therapy of cryptorchism), the control groups showed 0 efficacy in both types of studies; and in 3 topics the compared treatments (surgical interventions for urinary incontinence) were equally experimental, and thus there was no notion of a more favorable result.

Overall, these data suggested that larger treatment effects were somewhat more frequent to occur with nonrandomized studies than randomized trials (25 vs 14, exact P = .11; 28 vs 11 [correcting the 3 artifacts], exact P = .009 by Wilcoxon test). In 5 topics for which randomized trials suggested more favorable results than nonrandomized studies, there had been only 1 randomized trial performed (n = 18, n = 42, n = 59, n = 131, and n = 190).

Discrepancies Between Randomized Trials and Nonrandomized Studies

In 7 (16%) of the 45 topics, the difference between the randomized trials and nonrandomized studies based on random effects calculations was beyond what would be expected by chance alone (Table 2). By fixed-effects calculations, this occurred in another 5 of the 45 topics (total 27%). The rates of discrepancies were substantially higher when their definition was based on the relative magnitude of the treatment effects in the compared designs. The natural logarithms of the ORs differed by at least 50% in 28 (62%) of the 45 topics and in 15 cases (33%), the OR varied at least 2-fold between nonrandomized studies and randomized trials (Table 2).

Table Graphic Jump LocationTable 2. Frequency of Discrepancies Among Randomized Trials and Nonrandomized Studies for Various Definitions and Types of Studies*

There were trends for higher rates of discrepancies in comparisons involving historical control studies (Table 2) and the rates of discrepancies beyond chance tended to decrease when only prospective studies were considered (8% by random effects, 15% by fixed effects) or when simply historical control studies were excluded (11% by random effects, 21% by fixed effects). However, the magnitude of the treatment effect often differed substantially between randomized trials and nonrandomized studies regardless of which study designs were included in the latter group. Even when only prospective studies were considered, the natural logarithm of the ORs still differed by at least 50% in 16 (62%) of the 26 topics (Table 2).

When limited to studies published in 1986 or later, there were 5 discrepancies beyond chance by random effects among 23 topics that had at least 1 randomized trial and at least 1 nonrandomized study. The odds of having a discrepancy beyond chance tended to decrease when the average year of publication of the considered studies was more recent, but the change was not formally significant (OR, 0.93; P = .12).

In 6 of the 7 disagreements beyond chance by random effects (Table 3), the estimated treatment benefit was larger in nonrandomized studies than in randomized trials while in 1 case both treatments were equally experimental. Overall, more favorable results were significantly more common with nonrandomized studies vs randomized trials (exact P = .03 by Wilcoxon test; P = .02 when fixed effects disagreements were included).

Table Graphic Jump LocationTable 3. Discrepancies Beyond Chance Between Randomized Trials and Nonrandomized Studies*

The age of the study populations was largely similar in nonrandomized studies and randomized trials on topics with discrepancies (data not shown). There was also no clear difference in the mean follow-up, perhaps with the exception of 1 disagreement (anterior colporrhaphy vs needle suspension) for which nonrandomized studies tended to have longer follow-up. In 2 cases (screening mammography, hypertension in elderly people), the randomized trials were of much larger sample size than the nonrandomized studies. In 4 cases, randomized trials had been published on average 5 or more years later than the nonrandomized studies (Table 3). Typically, the included randomized and nonrandomized studies on the same topic administered treatment in the same way and outcome measures were similarly defined. Selection criteria could have differed between studies, but differences could occur even within randomized trials or within nonrandomized studies and not necessarily only between randomized trials and nonrandomized studies.

Our empirical evaluation of 45 medical topics has found that randomized trials and nonrandomized studies show a high correlation in their estimates of efficacy of medical interventions. However, a high correlation does not necessarily also mean a similar magnitude of effect. Randomized trials and nonrandomized studies often disagree substantially on how much a treatment works. In fact, we observed that it was somewhat more frequent to find larger treatment effects in nonrandomized studies vs randomized trials than for the opposite to occur. However, it is precarious to claim that a study design arriving at a more favorable effect is necessarily spurious while a study design showing a smaller benefit is always more reliable. For example, sometimes a flawed study may fail to identify an existing benefit, because of the "noise" caused by its errors.

Discrepancies beyond what could be explained by chance were not uncommon between the 2 types of designs. When we allowed also for the between-study variability for each type of design by using random effects calculations, discrepancies beyond chance still occurred in 7 of 45 topics. Recently, using different methods for identification of topics, Concato et al8 claimed no major disagreement for 5 randomized vs nonrandomized study comparisons while Benson and Hartz7 found that in 2 of the 19 comparisons the point estimate of the nonrandomized studies lay outside the 95% confidence intervals of the effect found by randomized trials. These 2 studies suggested a relatively higher concordance between the randomized and nonrandomized studies.54 Several of the topics covered by these 2 surveys were also included in our evaluation, but 13 topics were not. If these 2 evaluations and our own are merged, statistically significant discrepancies between randomized and nonrandomized studies occur in 7 of 58 topics by random-effects calculations and in 13 topics by fixed-effects calculations.

Of interest, significant between-study variability was seen as frequently among the randomized trials as between the randomized and nonrandomized studies. Furthermore, significant variability was seen more than 40% of the time among the nonrandomized studies on the same topic. Thus variability seems to be very common both in randomized and nonrandomized studies and perhaps more frequent in the latter. Variability may be due to bias, but it may also reflect differences in the true treatment effect under different study settings and in different populations.55

Part of the variability could have been due to the fact that we considered a wide spectrum of nonrandomized designs. Several of the discrepancies beyond chance occurred in cases where nonrandomized studies were represented by historical control studies or other retrospective designs that may be more susceptible to bias than prospective designs. In fact, there were relatively few discrepancies beyond chance when randomized trials were compared with prospective nonrandomized studies. Still, perfect agreement was not seen even for these comparisons, and it was very common to see major differences in the estimates of the treatment effect.

Perfect agreement is perhaps impossible to expect between different types of study design or even within the same study design. Even the best designed studies may differ in several parameters and may form a continuum in the spectrum of medical evidence that we can obtain from them. We observed discrepancies, such as the cases of screening mammography or the treatment of hypertension in elderly people, for which randomized studies had a very different sample size than their nonrandomized counterparts. It is conceivable that larger studies in which large-scale effectiveness is probed may yield more conservative results than smaller studies in which efficacy is assessed, regardless of study design. The same may hold true when the timing of each study is considered. We encountered examples, in which nonrandomized studies had been published earlier than the randomized trials. Early studies in selected populations may yield promising results that may lead to subsequent trials with the aim of validating the benefits in larger populations. Publication bias and a time lag for negative studies may also be operating, regardless of study design.56,57 Quality is also important to consider, and in this regard, sometimes nonrandomized or randomized studies,58 or both may have important quality defects. For example, a recent meta-analysis suggests that even within randomized trials, the ones with greater methodological rigor show no benefit while the ones with potential flaws may be spuriously overestimating the benefit.59

It is not known whether meta-analyses that examine both randomized and nonrandomized evidence may do so because the results of the 2 types of designs are fairly concordant. If this is true, then meta-analyses with both types of data may be a biased sample and this could explain in part the relatively good overall correlation that we observed. Avoidance of this potential bias makes a strong case for examining information from all types of studies in meta-analysis. Nevertheless, even with this selection approach, the frequency of discrepancies was quite high when based on the comparison of the magnitude of the observed treatment effects. On the other hand, the selection of topics from published meta-analyses also leaves the possibility of publication bias affecting the results of specific meta-analyses. However, it is not known whether such bias would affect nonrandomized studies more than randomized trials and whether there would be an overall net bias affecting our comparisons.

Although we included a substantial number of comparisons, larger than in any previous evaluation in this field, this is still a small sample compared with the number of medical questions that are being probed with randomized and nonrandomized studies. It is conceivable that for many questions of interest, randomized trials may never be performed, if early nonrandomized studies show either clear harm or a large benefit. The ethical barrier may become insurmountable in such cases. Conversely, nonrandomized studies may be considered unworthy of consideration if randomized evidence is available on a topic. Although we perused several hundreds of meta-analyses, the vast majority regarded the randomized design as a prerequisite for eligibility and most of them did not even cite the nonrandomized studies. This is unfair for epidemiological research that may often offer some complementary insights to those provided by randomized trials. We propose that future systematic reviews and meta-analyses should pay more attention to the available nonrandomized data. It would be wrong to reduce the efforts to promote randomized trials so as to obtain easy answers from nonrandomized designs.13 However, nonrandomized evidence may also be useful and may be helpful in the interpretation of the randomized results. Whenever discrepancies occur, such discrepancies should be carefully scrutinized since they may yield valuable information for designing future research.

Pocock SJ. Clinical Trials: A Practical ApproachChichester, England: John Wiley & Sons; 1983.
Chalmers TC, Matta RJ, Smith Jr H, Kunzler AM. Evidence favoring the use of anticoagulants in the hospital phase of acute myocardial infarction.  N Engl J Med.1977;297:1091-1096.
Sacks H, Chalmers TC, Smith Jr H. Randomized versus historical controls for clinical trials.  Am J Med.1982;72:233-240.
Colditz GA, Miller JN, Mosteller F. How study design affects outcomes in comparisons of therapy, I: medical.  Stat Med.1989;8:441-454.
Miller JN, Colditz GA, Mosteller F. How study design affects outcomes in comparisons of therapy, II: surgical.  Stat Med.1989;8:455-466.
Kunz R, Oxman AD. The unpredictability paradox: review of empirical comparisons of randomised and non-randomised clinical trials.  BMJ.1998;317:1185-1190.
Benson K, Hartz AJ. A comparison of observational studies and randomized, controlled trials.  N Engl J Med.2000;342:1878-1886.
Concato J, Shah N, Horwitz RI. Randomized controlled trials, observational studies and the hierarchy of research designs.  N Engl J Med.2000;342:1887-1892.
Hulley S, Grady D, Bush T.  et al.  Randomized trial of estrogen plus progestin for secondary prevention of coronary artery disease in postmenopausal women.  JAMA.1998;280:605-613.
The Alpha Tocopherol Beta Carotene Cancer Prevention Study Group.  The effect of vitamin E and beta carotene on the incidence of lung cancer and other cancers in male smokers.  N Engl J Med.1994;330:1029-1035.
Yusuf S, Dagenais G, Pogue J, Bosch J, Sleight P.for the Heart Outcomes Prevention Study Investigators.  Vitamin E supplementation and cardiovascular events in high-risk patients.  N Engl J Med.2000;342:154-160.
Schatzkin A, Lanza E, Corle D.  et al. for the Polyp Prevention Trial Study Group.  Lack of effect of a low-fat, high-fiber diet on the recurrence of colorectal adenomas.  N Engl J Med.2000;342:1149-1155.
Pocock SJ, Elbourne DR. Randomized trials or observational tribulations?  N Engl J Med.2000;342:1907-1909.
Rothman KJ, Greenland S. Modern Epidemiology2nd ed. Philadelphia, Pa: Lippincott-Raven; 1998.
DerSimonian R, Laird N. Meta-analysis in clinical trials.  Control Clin Trials.1986;7:177-188.
Mantel N, Haenszel WH. Statistical aspects of the analysis of data from retrospective studies of diseases.  J Natl Cancer Inst.1959;22:719-748.
Lau J, Ioannidis JPA, Schmid CH. Quantitative synthesis for systematic reviews.  Ann Intern Med.1997;127:820-826.
Ioannidis JP, Cappelleri JC, Lau J. Issues in comparisons of meta-analyses and large trials.  JAMA.1998;279:1089-1093.
Reimold SC, Chalmers TC, Berlin JA, Antman EM. Assessment of the efficacy and safety of antiarrhythmic therapy for chronic atrial fibrillation: observations on the role of trial design and implications of drug-related mortality.  Am Heart J.1992;124:924-932.
Carroll D, Tramer M, McQuay H, Nye B, Moore A. Randomization is important in studies with pain outcomes: systematic review of transcutaneous electrical nerve stimulation in acute postoperative pain.  Br J Anaesth.1996;77:798-803.
Recurrent Miscarriage Immunotherapy Trialists Group.  Worldwide collaborative observational study and meta-analysis of allogenic leukocyte immunotherapy for recurrent spontaneous abortion.  Am J Reprod Immunol.1994;32:55-72.
Pyorala S, Huttunen NP, Uhari M. A review and meta-analysis of hormonal treatment of cryptorchidism.  J Clin Endocrinol Metab.1995;80:2795-2799.
Tangkanakul C, Counsell C, Warlow C. Local vs general anaesthesia for carotid endarterectomy [Cochrane Review on CD-ROM]. Oxford, England: Cochrane Library, Update Software; 2000:Issue 3.
Brosseau L, Welch V, Wells G.  et al.  Low-level laser therapy (classes I, II and III) for the treatment of osteoarthritis [Cochrane Review on CD-ROM]. Oxford, England: Cochrane Library, Update Software; 2000:Issue 3.
Vandekerckhove P, Watson A, Lilford R, Harada T, Hughes E. Oil-soluble vs water-soluble media for assessing tubal patency with hysterosalpingography or laparoscopy in subfertile women [Cochrane Review on CD-ROM]. Oxford, England: Cochrane Library, Update Software; 2000:Issue 3.
Srisurapanont M, Jarusuraisin N. Opioid antagonists for alcohol dependence [Cochrane Review on CD-ROM]. Oxford, England: Cochrane Library, Update Software; 2000:Issue 3.
Patel MK, Lee CK. Polysaccharide vaccines for preventing serogroup A meningococcal meningitis [Cochrane Review on CD-ROM]. Oxford, England: Cochrane Library, Update Software; 2000:Issue 3.
Watson A, Vandekerckhove P, Lilford R. Techniques for pelvic surgery in subfertilility [Cochrane Review on CD-ROM]. Oxford, England: Cochrane Library, Update Software; 2000:Issue 3.
Towler B, Irwig L, Glasziou P, Kewenter J, Weller D, Silagy C. A systematic review of the effects of screening for colorectal cancer using the faecal occult blood test, hemoccult.  BMJ.1998;317:559-565.
Kerlikowske K, Grady D, Rubin SM, Sandrock C, Ernster VL. Efficacy of screening mammography: a meta-analysis.  JAMA.1995;273:149-154.
Abrutyn E, Berlin JA. Intrathecal therapy for tetanus: a meta-analysis.  JAMA.1991;266:2262-2267.
Glasziou PP, Mackerras DE. Vitamin A supplementation in infectious diseases: a meta-analysis.  BMJ.1993;306:366-370.
Camma C, Almasio P, Craxi A. Interferon as treatment for acute hepatitis C: a meta-analysis.  Dig Dis Sci.1996;41:1248-1255.
Gifford DS, Morton SC, Fiske M, Kahn K. A meta-analysis of infant outcomes after breech delivery.  Obstet Gynecol.1995;85:1047-1054.
Black NA, Downs SH. The effectiveness of surgery for stress incontinence in women: a systematic review.  Br J Urol.1996;78:497-510.
Colditz GA, Brewer TF, Berkey CS.  et al.  Efficacy of BCG vaccine in the prevention of tuberculosis: meta-analysis of the published literature.  JAMA.1994;271:698-702.
Cook RL, Rosenberg MJ. Do spermicides containing nonoxynol-9 prevent sexually transmitted infections? a meta-analysis.  Sex Transm Dis.1998;25:144-150.
Grullon KE, Grimes DA. The safety of early postpartum discharge: a review and critique.  Obstet Gynecol.1997;90:860-865.
Psaty BM, Smith NL, Siscovick DS.  et al.  Health outcomes associated with antihypertensive therapies used as first-line agents: a systematic review and meta-analysis.  JAMA.1997;277:739-745.
Imperiale TF, Petrulis AS. A meta-analysis of low-dose aspirin for the prevention of pregnancy-induced hypertensive disease.  JAMA.1991;266:260-264.
McAlister FA, Clark HD, Wells PS, Laupacis A. Perioperative allogeneic blood transfusion does not cause adverse sequelae in patients with cancer: a meta-analysis of unconfounded studies.  Br J Surg.1998;85:171-178.
Hommes DW, Bura A, Mazzolai L, Buller HR, ten Cate JW. Subcutaneous heparin compared with continuous intravenous heparin administration in the initial treatment of deep vein thrombosis.  Ann Intern Med.1992;116:279-284.
Fouque D, Laville M, Boissel JP, Chifflet R, Labeeuw M, Zech PY. Controlled low protein diets in chronic renal insufficiency: meta-analysis.  BMJ.1992;304:216-220.
Ramsey MJ, DerSimonian R, Holtel MR, Burgess LP. Corticosteroid treatment for idiopathic facial nerve paralysis: a meta-analysis.  Laryngoscope.2000;110:335-341.
Wells PS, Lensing AW, Hirsh J. Graduated compression stockings in the prevention of postoperative venous thromboembolism.  Arch Intern Med.1994;154:67-72.
Hart RG, Halperin JL, McBride R, Benavente O, Man-Son-Hing M, Kronmal RA. Aspirin for the primary prevention of stroke and other major vascular events.  Arch Neurol.2000;57:326-332.
Chinoy MA, Parker MJ. Fixed nail plates vs sliding hip systems for the treatment of trochanteric femoral fractures: a meta-analysis of 14 studies.  Injury.1999;30:157-163.
Callahan CM, Dittus RS, Katz BP. Oral corticosteroid therapy for patients with stable chronic obstructive pulmonary disease.  Ann Intern Med.1991;114:216-223.
Insua JT, Sacks HS, Lau TS.  et al.  Drug treatment of hypertension in the elderly: a meta-analysis.  Ann Intern Med.1994;121:355-362.
Vandenbroucke-Grauls CM, Vandenbroucke JP. Effect of selective decontamination of the digestive tract on respiratory tract infections and mortality in the intensive care unit.  Lancet.1991;338:859-862.
Kasiske BL, Heim-Duthoy K, Ma JZ. Elective cyclosporine withdrawal after renal transplantation: a meta-analysis.  JAMA.1993;269:395-400.
Ioannidis JPA, Salem D, Lau J. Accuracy and clinical effect of out-of-hospital electrocardiography in the diagnosis of acute cardiac ischemia: a meta-analysis.  Ann Emerg Med.2001;37:461-470.
CLASP.  A randomized trial of low-dose aspirin for the prevention and treatment of pre-eclampsia among 9,364 pregnant women.  Lancet.1994;343:619-629.
Ioannidis JPA, Haidich A-B, Lau J. Any casualties in the clash of randomized and observational evidence?  BMJ.2001;322:879-880.
Lau J, Ioannidis JP, Schmid CH. Summing up evidence: one answer is not always enough.  Lancet.1998;351:123-127.
Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR. Publication bias in clinical research.  Lancet.1991;337:867-872.
Ioannidis JP. Effect of the statistical significance of results on the time to completion and publication of randomized efficacy trials.  JAMA.1998;279:281-286.
Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias: dimensions of methodological quality associated with estimates of treatment effects in controlled trials.  JAMA.1995;273:408-412.
Gotzsche PC, Olsen O. Is screening for breast cancer with mammography justified?  Lancet.2000;355:129-134.

Figures

Figure 1. Comparison of the Summary Odds Ratio and 95% Confidence Interval in Randomized Trials vs Nonrandomized Studies for the 45 Topics
Graphic Jump Location
The topic numbers correspond to the identification numbers in Table 1. Calculations have been performed with random effects in the panel A and, for comparison, by fixed effects in the panel B. The topics have been ordered according to increasing odds ratio estimates in randomized trials using random-effects calculations. Data shown in blue indicate the topics in which the difference between randomized trials and nonrandomized studies was beyond what would be expected by chance alone. For 1 topic (No. 11), both of the summary estimates lie outside the depicted range.
Figure 2. Comparison of the Summary Odds Ratio in Randomized Trials vs Nonrandomized Studies
Graphic Jump Location
Historically controlled studies are excluded from the calculations. Calculations are performed with random effects. Odds ratios are shown in a natural logarithmic scale. Not shown is 1 topic with very large summary odds ratios (>25) for both types of designs.

Tables

Table Graphic Jump LocationTable 1a. Topics of Meta-analyses Considering Both Randomized Trials and Nonrandomized Studies*
Table Graphic Jump LocationTable 2. Frequency of Discrepancies Among Randomized Trials and Nonrandomized Studies for Various Definitions and Types of Studies*
Table Graphic Jump LocationTable 3. Discrepancies Beyond Chance Between Randomized Trials and Nonrandomized Studies*

References

Pocock SJ. Clinical Trials: A Practical ApproachChichester, England: John Wiley & Sons; 1983.
Chalmers TC, Matta RJ, Smith Jr H, Kunzler AM. Evidence favoring the use of anticoagulants in the hospital phase of acute myocardial infarction.  N Engl J Med.1977;297:1091-1096.
Sacks H, Chalmers TC, Smith Jr H. Randomized versus historical controls for clinical trials.  Am J Med.1982;72:233-240.
Colditz GA, Miller JN, Mosteller F. How study design affects outcomes in comparisons of therapy, I: medical.  Stat Med.1989;8:441-454.
Miller JN, Colditz GA, Mosteller F. How study design affects outcomes in comparisons of therapy, II: surgical.  Stat Med.1989;8:455-466.
Kunz R, Oxman AD. The unpredictability paradox: review of empirical comparisons of randomised and non-randomised clinical trials.  BMJ.1998;317:1185-1190.
Benson K, Hartz AJ. A comparison of observational studies and randomized, controlled trials.  N Engl J Med.2000;342:1878-1886.
Concato J, Shah N, Horwitz RI. Randomized controlled trials, observational studies and the hierarchy of research designs.  N Engl J Med.2000;342:1887-1892.
Hulley S, Grady D, Bush T.  et al.  Randomized trial of estrogen plus progestin for secondary prevention of coronary artery disease in postmenopausal women.  JAMA.1998;280:605-613.
The Alpha Tocopherol Beta Carotene Cancer Prevention Study Group.  The effect of vitamin E and beta carotene on the incidence of lung cancer and other cancers in male smokers.  N Engl J Med.1994;330:1029-1035.
Yusuf S, Dagenais G, Pogue J, Bosch J, Sleight P.for the Heart Outcomes Prevention Study Investigators.  Vitamin E supplementation and cardiovascular events in high-risk patients.  N Engl J Med.2000;342:154-160.
Schatzkin A, Lanza E, Corle D.  et al. for the Polyp Prevention Trial Study Group.  Lack of effect of a low-fat, high-fiber diet on the recurrence of colorectal adenomas.  N Engl J Med.2000;342:1149-1155.
Pocock SJ, Elbourne DR. Randomized trials or observational tribulations?  N Engl J Med.2000;342:1907-1909.
Rothman KJ, Greenland S. Modern Epidemiology2nd ed. Philadelphia, Pa: Lippincott-Raven; 1998.
DerSimonian R, Laird N. Meta-analysis in clinical trials.  Control Clin Trials.1986;7:177-188.
Mantel N, Haenszel WH. Statistical aspects of the analysis of data from retrospective studies of diseases.  J Natl Cancer Inst.1959;22:719-748.
Lau J, Ioannidis JPA, Schmid CH. Quantitative synthesis for systematic reviews.  Ann Intern Med.1997;127:820-826.
Ioannidis JP, Cappelleri JC, Lau J. Issues in comparisons of meta-analyses and large trials.  JAMA.1998;279:1089-1093.
Reimold SC, Chalmers TC, Berlin JA, Antman EM. Assessment of the efficacy and safety of antiarrhythmic therapy for chronic atrial fibrillation: observations on the role of trial design and implications of drug-related mortality.  Am Heart J.1992;124:924-932.
Carroll D, Tramer M, McQuay H, Nye B, Moore A. Randomization is important in studies with pain outcomes: systematic review of transcutaneous electrical nerve stimulation in acute postoperative pain.  Br J Anaesth.1996;77:798-803.
Recurrent Miscarriage Immunotherapy Trialists Group.  Worldwide collaborative observational study and meta-analysis of allogenic leukocyte immunotherapy for recurrent spontaneous abortion.  Am J Reprod Immunol.1994;32:55-72.
Pyorala S, Huttunen NP, Uhari M. A review and meta-analysis of hormonal treatment of cryptorchidism.  J Clin Endocrinol Metab.1995;80:2795-2799.
Tangkanakul C, Counsell C, Warlow C. Local vs general anaesthesia for carotid endarterectomy [Cochrane Review on CD-ROM]. Oxford, England: Cochrane Library, Update Software; 2000:Issue 3.
Brosseau L, Welch V, Wells G.  et al.  Low-level laser therapy (classes I, II and III) for the treatment of osteoarthritis [Cochrane Review on CD-ROM]. Oxford, England: Cochrane Library, Update Software; 2000:Issue 3.
Vandekerckhove P, Watson A, Lilford R, Harada T, Hughes E. Oil-soluble vs water-soluble media for assessing tubal patency with hysterosalpingography or laparoscopy in subfertile women [Cochrane Review on CD-ROM]. Oxford, England: Cochrane Library, Update Software; 2000:Issue 3.
Srisurapanont M, Jarusuraisin N. Opioid antagonists for alcohol dependence [Cochrane Review on CD-ROM]. Oxford, England: Cochrane Library, Update Software; 2000:Issue 3.
Patel MK, Lee CK. Polysaccharide vaccines for preventing serogroup A meningococcal meningitis [Cochrane Review on CD-ROM]. Oxford, England: Cochrane Library, Update Software; 2000:Issue 3.
Watson A, Vandekerckhove P, Lilford R. Techniques for pelvic surgery in subfertilility [Cochrane Review on CD-ROM]. Oxford, England: Cochrane Library, Update Software; 2000:Issue 3.
Towler B, Irwig L, Glasziou P, Kewenter J, Weller D, Silagy C. A systematic review of the effects of screening for colorectal cancer using the faecal occult blood test, hemoccult.  BMJ.1998;317:559-565.
Kerlikowske K, Grady D, Rubin SM, Sandrock C, Ernster VL. Efficacy of screening mammography: a meta-analysis.  JAMA.1995;273:149-154.
Abrutyn E, Berlin JA. Intrathecal therapy for tetanus: a meta-analysis.  JAMA.1991;266:2262-2267.
Glasziou PP, Mackerras DE. Vitamin A supplementation in infectious diseases: a meta-analysis.  BMJ.1993;306:366-370.
Camma C, Almasio P, Craxi A. Interferon as treatment for acute hepatitis C: a meta-analysis.  Dig Dis Sci.1996;41:1248-1255.
Gifford DS, Morton SC, Fiske M, Kahn K. A meta-analysis of infant outcomes after breech delivery.  Obstet Gynecol.1995;85:1047-1054.
Black NA, Downs SH. The effectiveness of surgery for stress incontinence in women: a systematic review.  Br J Urol.1996;78:497-510.
Colditz GA, Brewer TF, Berkey CS.  et al.  Efficacy of BCG vaccine in the prevention of tuberculosis: meta-analysis of the published literature.  JAMA.1994;271:698-702.
Cook RL, Rosenberg MJ. Do spermicides containing nonoxynol-9 prevent sexually transmitted infections? a meta-analysis.  Sex Transm Dis.1998;25:144-150.
Grullon KE, Grimes DA. The safety of early postpartum discharge: a review and critique.  Obstet Gynecol.1997;90:860-865.
Psaty BM, Smith NL, Siscovick DS.  et al.  Health outcomes associated with antihypertensive therapies used as first-line agents: a systematic review and meta-analysis.  JAMA.1997;277:739-745.
Imperiale TF, Petrulis AS. A meta-analysis of low-dose aspirin for the prevention of pregnancy-induced hypertensive disease.  JAMA.1991;266:260-264.
McAlister FA, Clark HD, Wells PS, Laupacis A. Perioperative allogeneic blood transfusion does not cause adverse sequelae in patients with cancer: a meta-analysis of unconfounded studies.  Br J Surg.1998;85:171-178.
Hommes DW, Bura A, Mazzolai L, Buller HR, ten Cate JW. Subcutaneous heparin compared with continuous intravenous heparin administration in the initial treatment of deep vein thrombosis.  Ann Intern Med.1992;116:279-284.
Fouque D, Laville M, Boissel JP, Chifflet R, Labeeuw M, Zech PY. Controlled low protein diets in chronic renal insufficiency: meta-analysis.  BMJ.1992;304:216-220.
Ramsey MJ, DerSimonian R, Holtel MR, Burgess LP. Corticosteroid treatment for idiopathic facial nerve paralysis: a meta-analysis.  Laryngoscope.2000;110:335-341.
Wells PS, Lensing AW, Hirsh J. Graduated compression stockings in the prevention of postoperative venous thromboembolism.  Arch Intern Med.1994;154:67-72.
Hart RG, Halperin JL, McBride R, Benavente O, Man-Son-Hing M, Kronmal RA. Aspirin for the primary prevention of stroke and other major vascular events.  Arch Neurol.2000;57:326-332.
Chinoy MA, Parker MJ. Fixed nail plates vs sliding hip systems for the treatment of trochanteric femoral fractures: a meta-analysis of 14 studies.  Injury.1999;30:157-163.
Callahan CM, Dittus RS, Katz BP. Oral corticosteroid therapy for patients with stable chronic obstructive pulmonary disease.  Ann Intern Med.1991;114:216-223.
Insua JT, Sacks HS, Lau TS.  et al.  Drug treatment of hypertension in the elderly: a meta-analysis.  Ann Intern Med.1994;121:355-362.
Vandenbroucke-Grauls CM, Vandenbroucke JP. Effect of selective decontamination of the digestive tract on respiratory tract infections and mortality in the intensive care unit.  Lancet.1991;338:859-862.
Kasiske BL, Heim-Duthoy K, Ma JZ. Elective cyclosporine withdrawal after renal transplantation: a meta-analysis.  JAMA.1993;269:395-400.
Ioannidis JPA, Salem D, Lau J. Accuracy and clinical effect of out-of-hospital electrocardiography in the diagnosis of acute cardiac ischemia: a meta-analysis.  Ann Emerg Med.2001;37:461-470.
CLASP.  A randomized trial of low-dose aspirin for the prevention and treatment of pre-eclampsia among 9,364 pregnant women.  Lancet.1994;343:619-629.
Ioannidis JPA, Haidich A-B, Lau J. Any casualties in the clash of randomized and observational evidence?  BMJ.2001;322:879-880.
Lau J, Ioannidis JP, Schmid CH. Summing up evidence: one answer is not always enough.  Lancet.1998;351:123-127.
Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR. Publication bias in clinical research.  Lancet.1991;337:867-872.
Ioannidis JP. Effect of the statistical significance of results on the time to completion and publication of randomized efficacy trials.  JAMA.1998;279:281-286.
Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias: dimensions of methodological quality associated with estimates of treatment effects in controlled trials.  JAMA.1995;273:408-412.
Gotzsche PC, Olsen O. Is screening for breast cancer with mammography justified?  Lancet.2000;355:129-134.
CME
Also Meets CME requirements for:
Browse CME for all U.S. States
Accreditation Information
The American Medical Association is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians. The AMA designates this journal-based CME activity for a maximum of 1 AMA PRA Category 1 CreditTM per course. Physicians should claim only the credit commensurate with the extent of their participation in the activity. Physicians who complete the CME course and score at least 80% correct on the quiz are eligible for AMA PRA Category 1 CreditTM.
Note: You must get at least of the answers correct to pass this quiz.
Your answers have been saved for later.
You have not filled in all the answers to complete this quiz
The following questions were not answered:
Sorry, you have unsuccessfully completed this CME quiz with a score of
The following questions were not answered correctly:
Commitment to Change (optional):
Indicate what change(s) you will implement in your practice, if any, based on this CME course.
Your quiz results:
The filled radio buttons indicate your responses. The preferred responses are highlighted
For CME Course: A Proposed Model for Initial Assessment and Management of Acute Heart Failure Syndromes
Indicate what changes(s) you will implement in your practice, if any, based on this CME course.

Multimedia

Some tools below are only available to our subscribers or users with an online account.

Web of Science® Times Cited: 330

Related Content

Customize your page view by dragging & repositioning the boxes below.

Articles Related By Topic
Related Collections
PubMed Articles
JAMAevidence.com

Users' Guides to the Medical Literature
Clinical Scenario

Users' Guides to the Medical Literature
Example 1: Diabetes and Target Blood Pressure