0
Review |

Stopping Randomized Trials Early for Benefit and Estimation of Treatment Effects:  Systematic Review and Meta-regression Analysis FREE

Dirk Bassler, MD, MSc; Matthias Briel, MD, MSc; Victor M. Montori, MD, MSc; Melanie Lane, BA; Paul Glasziou, MBBS, PhD; Qi Zhou, PhD; Diane Heels-Ansdell, MSc; Stephen D. Walter, PhD; Gordon H. Guyatt, MD, MSc; and the STOPIT-2 Study Group
[+] Author Affiliations

Author Affiliations: Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada (Drs Bassler, Briel, Zhou, Walter, and Guyatt and Ms Heels-Ansdell); Department of Neonatology, University Children's Hospital Tuebingen, Tuebingen, Germany (Dr Bassler); Basel Institute for Clinical Epidemiology and Biostatistics, University Hospital Basel, Basel, Switzerland (Dr Briel); Knowledge and Encounter Research Unit, Mayo Clinic, Rochester, Minnesota (Dr Montori and Ms Lane); and Centre for Evidence-Based Medicine, Department of Primary Health Care, University of Oxford, Oxford, UK (Dr Glasziou).


JAMA. 2010;303(12):1180-1187. doi:10.1001/jama.2010.310.
Text Size: A A A
Published online

Context Theory and simulation suggest that randomized controlled trials (RCTs) stopped early for benefit (truncated RCTs) systematically overestimate treatment effects for the outcome that precipitated early stopping.

Objective To compare the treatment effect from truncated RCTs with that from meta-analyses of RCTs addressing the same question but not stopped early (nontruncated RCTs) and to explore factors associated with overestimates of effect.

Data Sources Search of MEDLINE, EMBASE, Current Contents, and full-text journal content databases to identify truncated RCTs up to January 2007; search of MEDLINE, Cochrane Database of Systematic Reviews, and Database of Abstracts of Reviews of Effects to identify systematic reviews from which individual RCTs were extracted up to January 2008.

Study Selection Selected studies were RCTs reported as having stopped early for benefit and matching nontruncated RCTs from systematic reviews. Independent reviewers with medical content expertise, working blinded to trial results, judged the eligibility of the nontruncated RCTs based on their similarity to the truncated RCTs.

Data Extraction Reviewers with methodological expertise conducted data extraction independently.

Results The analysis included 91 truncated RCTs asking 63 different questions and 424 matching nontruncated RCTs. The pooled ratio of relative risks in truncated RCTs vs matching nontruncated RCTs was 0.71 (95% confidence interval, 0.65-0.77). This difference was independent of the presence of a statistical stopping rule and the methodological quality of the studies as assessed by allocation concealment and blinding. Large differences in treatment effect size between truncated and nontruncated RCTs (ratio of relative risks <0.75) occurred with truncated RCTs having fewer than 500 events. In 39 of the 63 questions (62%), the pooled effects of the nontruncated RCTs failed to demonstrate significant benefit.

Conclusions Truncated RCTs were associated with greater effect sizes than RCTs not stopped early. This difference was independent of the presence of statistical stopping rules and was greatest in smaller studies.

Figures in this Article

Although randomized controlled trials (RCTs) generally provide credible evidence of treatment effects, multiple problems may emerge when investigators terminate a trial earlier than planned,1 especially when the decision to terminate the trial is based on the finding of an apparently beneficial treatment effect. Bias may arise because large random fluctuations of the estimated treatment effect can occur, particularly early in the progress of a trial.2 When investigators stop a trial based on an apparently beneficial treatment effect, their results may therefore provide misleading estimates of the benefit.3,4 Statistical modeling suggests that RCTs stopped early for benefit (truncated RCTs) will systematically overestimate treatment effects,5 and empirical data demonstrate that truncated RCTs often show implausibly large treatment effects.6

Empirical evidence addressing the magnitude of bias from stopping early, and factors that may influence the magnitude of the bias, remain limited and the appropriate interpretation of truncated RCTs a matter of controversy.611 We therefore undertook a systematic review to determine the treatment effect from truncated RCTs compared with meta-analyses of RCTs addressing the same research question that were not stopped early (nontruncated RCTs) and to explore factors associated with overestimates of effect.

A prior report provides a detailed description of the design and methods of this study (Study of Trial Policy of Interim Truncation-2 [STOPIT-2]).12 In summary, we conducted extensive literature searches to identify truncated RCTs and systematic reviews addressing the same question. We retrieved all RCTs included in the systematic reviews, extracted data and conducted new meta-analyses of the nontruncated RCTs addressing the outcome that led to the early termination of the truncated RCTs, and compared the relative risk (RR) generated by the truncated RCTs with the RR from all matching nontruncated RCTs.

Literature Search

We updated the database from our prior study following the same search strategy.6 In January 2007 we searched MEDLINE, EMBASE, Current Contents, and full-text journal content databases from their inception for truncated RCTs. In addition, we identified truncated RCTs through hand searching, by personal contact with trial investigators, and by a citation search linked to 2 key articles.6,13 For systematic reviews, we searched the Cochrane Database of Systematic Reviews, the Database of Abstracts of Reviews of Effects, and MEDLINE from their inception to January 2008.

Eligibility Criteria for Truncated RCTs and Matching Systematic Reviews

We included RCTs of any intervention reported as having stopped earlier than initially planned owing to interim results in favor of the intervention.

We excluded matching systematic reviews that did not have a methods section and did not describe a literature search that, at minimum, included MEDLINE.12

Identification, Retrieval, and Eligibility of Nontruncated RCTs

We retrieved the full text of all RCTs included in each systematic review. If a systematic review was published prior to the matching truncated RCT and thus did not include the truncated RCT, we updated this review.12 Eligible nontruncated RCTs addressed the outcome that led to the early termination of the truncated RCT and stated clearly that allocation was randomized. We assessed the eligibility of nontruncated RCTs based on the similarity of the question addressed by the matching truncated RCT (see Briel et al12 for details).

Teams of 2 reviewers with relevant clinical expertise made independent eligibility and similarity decisions and resolved disagreement by discussion and, if necessary, by consulting a third party. Reviewers who judged eligibility were blinded to the results of the trials through electronic or manual masking.12

Data Extraction and Analysis

Working in pairs, reviewers with methodological expertise conducted data extraction independently.12 From each RCT (truncated or nontruncated), we collected information about early termination, the journal of publication (we categorized Annals of Internal Medicine, BMJ, JAMA, Lancet, and New England Journal of Medicine as high-impact journals), the year of publication, methodological quality, data monitoring committees, stopping rules at the outset of the trial, interim analyses, and the measure of treatment effect for the outcome that terminated the truncated RCT. The only study characteristic tested for a statistically significant difference between truncated and nontruncated RCTs was publication in a high-impact journal.

We calculated an RR for each RCT in our study. For studies that provided results using continuous data, we estimated an approximate dichotomous equivalent.12,14 For each question, we used meta-analysis for the pooled RR and a 95% confidence interval (CI) for all nontruncated RCTs. If more than 1 truncated RCT addressed the same question, we calculated a pooled RR and CI for those truncated RCTs. Pooled estimates of RRs were calculated using an inverse-variance weighted random-effects model.

We performed a z test for each meta-analysis to assess differences between the truncated and nontruncated RCTs with respect to their pooled RRs. As a summary measure we calculated a ratio of RRs, and its logarithm, for each question as follows:

Log[ratio of RRs] = log[RR of truncated RCT(s)/pooled RR of nontruncated RCTs] = log[RR of truncated RCT(s)] – log[pooled RR of nontruncated RCTs].

We estimated the pooled log[ratio of RRs] using a random-effects inverse-variance meta-analysis and then, for purposes of presentation, back transformed to the overall ratio of RRs. To explore factors associated with the magnitude of the ratio of RRs, we performed a meta-regression analysis in which the dependent variable was the log[ratio of RRs] and independent variables were whether the truncated RCTs used a formal stopping rule and the number of outcome events in the truncated RCTs. When more than 1 truncated RCT addressed the same question, the stopping rule status was assigned to “has a rule” if at least 1 truncated RCT had a rule. Similarly, when there was more than 1 truncated RCT for the same question, we used the truncated RCT with the largest number of events as the source for our analyses of the influence of the number of events.

To allow consideration of methodological quality as a predictor and to test whether restriction to nontruncated RCTs that are more similar to the truncated RCTs would change the results, we constructed a second meta-regression described fully in a prior report.12 In brief, this meta-regression used a hierarchical model with 2 levels: individual RCT (study) level and meta-analysis (question) level. The dependent variable in this analysis was the logarithm of the RR for each study. Predictor variables considered included a combined group variable (truncated RCT with a rule, truncated RCT without a rule, nontruncated RCT), number of events, concealment of allocation, use of blinding, and the interaction between the group variable and the other variables. We performed this meta-regression on different data sets based on various thresholds for the similarity of the nontruncated RCTs in each question to the matching truncated RCTs.

To test for an order effect (the hypothesis being that studies published earlier will have more responsive populations), for each review question we established where in the sequence of published studies (by publication date) the truncated RCT stood and referred to this as the “rank” of the truncated RCT. We then calculated a “standardized rank” [100 · (rank − 1)/(total number of studies − 1)]. If there was more than 1 truncated RCT in the review question, we used the median among the truncated RCTs as the standardized rank of truncated RCTs. We then divided the review questions into 2 groups (those with the standardized rank of the truncated RCT equal to or less than 50 [n = 27] and those with the standardized rank of the truncated RCT greater than 50 [n = 37]) and repeated the meta-analysis for each group.

As a secondary analysis, we compared the RR of the truncated RCT(s) with the pooled estimate for all trials including the truncated RCTs.

Analyses were performed using SAS version 9.2 (SAS Institute Inc, Cary, North Carolina); tests were 2-sided, and P < .05 was used as the threshold for statistical significance.

Literature Search

A total of 195 truncated RCTs formed the basis for the search for systematic reviews; we identified matching systematic reviews for 79 questions. We extracted 2488 nontruncated RCTs from 202 matching systematic reviews (of which 32 were updated). Of these 2488 studies, 22 (0.9%) proved to be truncated RCTs, which we added to the truncated RCT database. We excluded 2012 nontruncated RCTs based on insufficient similarity to the truncated RCTs or unclear randomization and 30 because the RR could not be calculated. The remaining 424 nontruncated RCTs and 91 matching truncated RCTs addressed 63 questions (Figure 1). An eSupplement reporting the references of the included studies is available.

Place holder to copy figure label and caption
Figure 1. Selection Process for Study Inclusion
Graphic Jump Location

RCT indicates randomized controlled trial; RR, relative risk.

Study Characteristics

Table 1 describes the characteristics of the eligible studies. Compared with matching nontruncated RCTs, truncated RCTs were more likely to be published in high-impact journals (30% vs 68%, P < .001).

Table Graphic Jump LocationTable 1. Characteristics of Randomized Controlled Trials Stopped Early for Benefit and Those Not Stopped Early for Benefit Asking the Same Research Question
Quantification of Differences in Treatment Effect Size

Of 63 comparisons, the ratio of RRs was equal to or less than 1.0 in 55 (87%); the weighted average ratio of RRs was 0.71 (95% CI, 0.65-0.77; P < .001) (Figure 2). In 39 of 63 comparisons (62%), the pooled estimates for nontruncated RCTs were not statistically significant.

Place holder to copy figure label and caption
Figure 2. Pooled Ratio of Relative Risks (RRs) and 95% Confidence Intervals (CIs) for Truncated vs Nontruncated Randomized Controlled Trials (RCTs)
Graphic Jump Location

First column indicates number associated with the question addressed by each review that included 1 or more truncated and matching nontruncated RCTs. Results ordered by P values associated with results of nontruncated RCTs; size of the data markers indicates weight of review questions in meta-analysis.

Comparison of the truncated RCTs with all RCTs (including the truncated RCTs) demonstrated a weighted average ratio of RRs of 0.85; in 16 of 63 comparisons (25%), the pooled estimate failed to demonstrate a significant effect.

Determinants of Differences in Treatment Effect Size

Table 2 summarizes the findings from the single-level meta-regression analysis to determine predictors of differences in the treatment effect size between truncated and nontruncated RCTs. In the univariable models, both the number of events (P < .001) and the presence of a statistical stopping rule (P = .02) were significant. When we included both variables in the model, only the number of events remained significant (P < .001). The results from the multilevel meta-regression confirmed significant interactions between the combined variable (truncated vs nontruncated RCT) and the number of events (P < .001). Large differences in treatment effect size between truncated and nontruncated RCTs (ratio of RRs <0.75) occurred in truncated RCTs with fewer than 500 events (Figure 3).

Place holder to copy figure label and caption
Figure 3. Weighted Bubble Plot Showing the Ratio of Relative Risks (RRs) vs the Total Number of Outcome Events in Truncated Randomized Controlled Trials (RCTs)
Graphic Jump Location

The size of each bubble is proportional to the magnitude of the inverse of the variance of the ratio of RR in the log scale. The dashed line indicates a ratio of RR of 0.71; the dotted line, a ratio of RR of 1.00. The shaded areas numbered 1 through 3 correspond to different degrees of overestimates of effect (ratios of RRs, 0.05-0.5; 0.5-0.75; 0.75-1.00): in area 1, very large overestimation (ratio of RR, 0.37; 95% confidence interval [CI], 0.31-0.44; P < .001) occurred in truncated trials with fewer than 200 events. In area 2, large overestimation (ratio of RR, 0.65; 95% CI, 0.56-0.77; P < .001) occurred in truncated trials stopped between 200 and 500 events. In area 3, truncated trials with more than 500 events led to moderate overestimation (ratio of RR, 0.88; 95% CI, 0.80-0.96; P = .003).

Table Graphic Jump LocationTable 2. Meta-regression Model Investigating the Predictors of the Log[Ratio of Relative Risks]a

The multilevel meta-regression analysis using the entire data set demonstrated that neither concealment of allocation (P = .96) nor blinding (P = .32) were significant predictors of the differences in treatment effect size.

Different Data Sets and Order of Publication

The findings were similar, irrespective of either the closeness of the match between nontruncated and truncated RCTs or the order of publication of the truncated RCTs relative to that of matching nontruncated RCTs. In the multilevel meta-regression analysis, adjusted ratios of RRs of truncated vs nontruncated RCTs were 0.64 when questions were very closely matched, 0.70 when they were moderately close, and 0.69 when they were least close. The ratio of RRs of the group in which the truncated RCTs were published in early years (standardized rank ≤50) was 0.74 (95% CI, 0.66-0.83) and for the later years (standardized rank >50) was 0.68 (95% CI, 0.60-0.77). The P value for the difference between the 2 estimates was 0.33.

Summary of Findings

In this empirical study including 91 truncated RCTs and 424 matching nontruncated RCTs addressing 63 questions, we found that truncated RCTs provide biased estimates of effects on the outcome that precipitated early stopping. On average, the ratio of RRs in the truncated RCTs and matching nontruncated RCTs was 0.71. This implies that, for instance, if the RR from the nontruncated RCTs was 0.8 (a 20% relative risk reduction), the RR from the truncated RCTs would be on average approximately 0.57 (a 43% relative risk reduction, more than double the estimate of benefit). Nontruncated RCTs with no evidence of benefit—ie, with an RR of 1.0—would on average be associated with a 29% relative risk reduction in truncated RCTs addressing the same question.

In nearly two-thirds of comparisons, the pooled estimate for nontruncated RCTs failed to demonstrate a statistically significant effect. We found substantial heterogeneity in our analysis of the pooled ratio of RRs for truncated vs nontruncated RCTs, suggesting that differences between truncated and nontruncated RCT effects will differ across study questions. This heterogeneity could be partially explained by the total number of outcome events in the truncated RCTs, with larger differences between truncated and nontruncated RCTs in studies with a smaller number of events.

The methodological quality and the presence of a statistical stopping rule failed to predict the observed difference in the treatment effect.

Strengths and Limitations

We used rigorous search strategies and undertook an intensive independent evaluation of eligibility and similarity of several thousand RCTs blinded to the results. Our analysis had considerable statistical power to link the estimates of treatment effect from truncated and nontruncated RCTs addressing the same question and demonstrated consistent results across degrees of similarity of the question addressed by the truncated RCTs and the matching nontruncated RCTs.

Our literature search, while extensive, missed some truncated RCTs. Assessment of the 2488 RCTs included in the systematic reviews revealed 22 additional truncated RCTs not initially identified. Whether results would differ in other unidentified truncated RCTs remains speculative.

We relied on systematic reviews to identify nontruncated RCTs but did not assess the reviews' susceptibility to publication bias. However, we know that trials with positive findings have nearly 4 times the odds of being published compared with those with negative findings.15 To the extent that publication bias is present, inclusion of unpublished studies would lead to a diminished pooled effect from the nontruncated RCTs. This would in turn likely lead to a larger gradient of effect between truncated and nontruncated RCTs. Thus, to the extent that publication bias exists, our results probably represent a conservative estimate of the exaggeration in treatment benefit associated with stopping early.

Relation to Recent Empirical Studies, Simulation, and Commentaries

Korn and colleagues recently reviewed the results of cancer trials stopped early and that either continued with further follow-up or released results early.16 They found that substantial differences between results at the time of early stopping and subsequent follow-up seldom occurred.16 Freidlin and Korn published a related simulation study that supported these findings, suggesting that if the true effect is large, differences between stopped-early results and full follow-up results will differ little.17

These recent studies confirm that even when the true effect is large, studies stopped early still overestimate that effect. More important, the authors do not address circumstances in which the true underlying effect is small or absent. Clinicians seek the best estimate of an unknown true underlying effect with appropriate safeguards against bias. As Goodman18 points out in a commentary on the simulation by Freidlin and Korn, “since we do not know what the true effect is, we cannot know in any particular case whether the observed effect is biased or not; the fact that the trial is stopped early is not prima facie evidence that the estimate is wrong.” We support this statement; unfortunately, neither do we know that the stopped-early result is close to the truth. Our findings suggest that often it is not.

Implications

Consensus exists that rigorous data monitoring practice requires a predefined statistical stopping rule.19,20 Our findings, however, indicate that even a formal rule is insufficient to prevent bias consequent on stopping early and suggest the advisability of rules that require a large number of outcome events before early stopping is contemplated.

In this review we have focused only on RCTs stopped early for benefit. Although ethical concerns make decisions regarding stopping trials early for safety more complex than those regarding stopping trials early for benefit, inferences regarding harm and those regarding benefit are equally susceptible to the bias associated with stopping early.

Our results have important implications for systematic reviews and ethics.21,22 If reviewers do not note truncation and do not consider early stopping for benefit, meta-analyses will report overestimates of effects.21 Investigators and funding bodies—in particular, drug and device manufacturers—have different but convergent interests to stop a study as soon as an important difference between experimental and control groups emerges, and journals have an interest in publishing the apparently exciting findings. Furthermore, data monitoring committees are well aware of their ethical obligation to ensure that patients are offered effective treatment as soon as it is clear that effective treatment is indeed available, providing a justification for stopping early.

However, data monitoring committees also have an ethical obligation to future patients who need to know more than whether data crossed a significance threshold; these patients need precise and accurate data on patient-important outcomes, of both risk and benefits, to make treatment choices.22 Such patients will often number in the tens or hundreds of thousands and sometimes in the millions. To the extent that substantial overestimates of treatment effect are widely disseminated, patients and clinicians will be misled when trying to balance benefits, harms, inconvenience, and cost of a possible health care intervention. If the true treatment effect is negligible or absent—as our results suggest it sometimes might be—acting on the results of a trial stopped early will be even more problematic. Thus, for trial investigators, our results suggest the desirability of stopping rules demanding large numbers of events. For clinicians, they suggest the necessity of assuming the likelihood of appreciable overestimates of effect in trials stopped early.

Corresponding Author: Victor Montori, MD, MSc, Knowledge and Encounter Research Unit, Mayo Clinic, Plummer 3-35, 200 First St SW, Rochester, MN 55905 (montori.victor@mayo.edu).

Author Contributions: Drs Montori and Guyatt had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Bassler, Briel, Montori, Glasziou, Zhou, Heels-Ansdell, Walter, Guyatt.

Acquisition of data: Bassler, Briel, Montori, Lane, Malaga, Akl, Ferreira-Gonzalez, Alonso-Coello, Urrutia, Kunz, Ruiz Culebro, Alves da Silva, Flynn, Elamin, Strahm, Murad, Djulbegovic, Adhikari, Mills, Gwadry-Sridhar, Kirpalani, Soares, Abu Elnour, You, Karanicolas, Bucher, Lampropulos, Nordmann, Burns, Mulla, Raatz, Sood, Kaur, Bankhead, Mullan, Nerenberg, Vandvik, Coto-Yglesias, Schünemann, Tuche, Chrispim, Cook, Lutz, Ribic, Vale, Erwin, Nerenberg, Montori.

Analysis and interpretation of data: Bassler, Briel, Montori, Glasziou, Zhou, Heels-Ansdell, Ramsay, Walter, Guyatt.

Drafting of the manuscript: Bassler, Briel, Montori, Lane, Glasziou, Zhou, Heels-Ansdell, Walter, Guyatt.

Critical revision of the manuscript for important intellectual content: Bassler, Briel, Montori, Lane, Glasziou, Zhou, Heels-Ansdell, Malaga, Akl, Ferreira-Gonzalez, Alonso-Coello, Urrutia, Kunz, Ruiz Culebro, Alves da Silva, Flynn, Elamin, Strahm, Murad, Djulbegovic, Adhikari, Mills, Gwadry-Sridhar, Kirpalani, Soares, Abu Elnour, You, Karanicolas, Bucher, Lampropulos, Nordmann, Burns, Mulla, Raatz, Sood, Kaur, Bankhead, Mullan, Nerenberg, Vandvik, Coto-Yglesias, Schünemann, Tuche, Chrispim, Cook, Lutz, Ribic, Vale, Erwin, Perera, Ramsay, Walter, Guyatt.

Statistical analysis: Zhou, Heels-Ansdell, Walter.

Obtained funding: Glasziou, Bassler, Perera, Walter, Guyatt.

Administrative, technical, or material support: Glasziou, Montori, Lane, Flynn, Elamin, Bassler, Guyatt.

Study supervision: Montori, Glasziou, Walter, Bassler, Briel, Guyatt.

Financial Disclosures: None reported.

Funding/Support: This study was funded by the UK Medical Research Council (reference G0600561). Dr Briel is supported by a scholarship for advanced researchers from the Swiss National Science Foundation (PASMA-112951/1) and the Roche Research Foundation. Dr Kunz, Dr Bucher, Dr Nordmann, and Dr Raatz are supported by grants from Santésuisse and the Gottfried and Julia Bangerter-Rhyner-Foundation. Dr Cook holds a Research Chair from the Canadian Institutes of Health Research.

Role of the Sponsor: The UK Medical Research Council had no role in the design and conduct of the study; the collection, management, analysis, and interpretation of data; or the preparation, review, or approval of the manuscript.

Additional Contributions: We thank Monica Owen, Michelle Vanderby, Shelley Anderson, BA, and Deborah Maddock, all from the Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada, and Amanda Bedard, Knowledge and Encounter Research Unit, Mayo Clinic, Rochester, Minnesota, for their administrative assistance. We are grateful to Ratchada Kitsommart, MD, Department of Pediatrics, Siriraj Hospital Mahidol University, Bangkok, Thailand, Chusak Okascharoen, MD, PhD, Department of Pediatrics, Ramathibodi Faculty of Medicine, Mahidol University, Bangkok, for their help with blinding of articles, and Luma Muhtadie, BSc, University of California, Berkeley, and Kayi Li, BHSc, University of Toronto, Toronto, Ontario, Canada, for their help with the literature search. None of these individuals received any extra compensation for their contributions.

Authors/Members of the Study of Trial Policy of Interim Truncation-2 (STOPIT-2) Group: Dirk Bassler, MD, MSc (Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada, and Department of Neonatology, University Children's Hospital Tuebingen, Tuebingen, Germany); Matthias Briel, MD, MSc (Department of Clinical Epidemiology and Biostatistics, McMaster University, and Basel Institute for Clinical Epidemiology and Biostatistics, University Hospital Basel, Basel, Switzerland); Victor M. Montori, MD, MSc, Melanie Lane, BA, David N. Flynn, BS, Mohamed B. Elamin, MBBS, Mohammad Hassan Murad, MD, MPH, Nisrin O. Abu Elnour, MBBS, Julianna F. Lampropulos, MD, Amit Sood, MD, MSc, Rebecca J. Mullan, MSc, and Patricia J. Erwin, MLS (Knowledge and Encounter Research Unit, Mayo Clinic, Rochester, Minnesota); Paul Glasziou, MBBS, PhD, Clare R. Bankhead, DPhil, and Rafael Perera, DPhil, MSc (Centre for Evidence-Based Medicine, Department of Primary Health Care, University of Oxford, Oxford, United Kingdom); Qi Zhou, PhD, Diane Heels-Ansdell, MSc, Carolina Ruiz Culebro, MD, John J. You, MD, MSc, Sohail M. Mulla, Jagdeep Kaur, PhD, CRA, Kara A. Nerenberg, MD, MSc, Holger Schünemann, MD, PhD, Deborah J. Cook, MD, MSc, Kristina Lutz, Christine M. Ribic, MD, MSc, Noah Vale, MD, Stephen D. Walter, PhD, and Gordon H. Guyatt, MD, MSc (Department of Clinical Epidemiology and Biostatistics, McMaster University); German Malaga, MD, MSc (Universidad Peruana Cayetano Heredia, Lima, Peru); Elie A. Akl, MD, PhD (Department of Clinical Epidemiology and Biostatistics, McMaster University, and Departments of Medicine and Family Medicine, State University of New York at Buffalo); Ignacio Ferreira-Gonzalez, PhD (Cardiology Department, Vall d’Hebron Hospital, CIBER de Epidemiología y Salud Pública, Barcelona, Spain); Pablo Alonso-Coello, MD, PhD, and Gerard Urrutia, MD (Centro Cochrane Iberoamericano, Hospital Sant Pau, Barcelona, and CIBER de Epidemiologia y Salud Publica, Barcelona); Regina Kunz, MD, MSc, Heiner C. Bucher, MD, MPH, Alain J. Nordmann, MD, MSc, and Heike Raatz, MD, MSc (Basel Institute for Clinical Epidemiology and Biostatistics, University Hospital Basel); Suzana Alves da Silva, MD, MSc, and Fabio Tuche, MD (Teaching and Research Center of Pro-Cardiaco, Rio de Janeiro, Brazil); Brigitte Strahm, MD (Pediatric Hematology and Oncology Centre for Pediatrics and Adolescent Medicine, University Hospital Freiburg, Freiburg, Germany); Benjamin Djulbegovic, MD, PhD (Center for Evidence-based Medicine, USF Health Clinical Research, Tampa, Florida); Neill K. J. Adhikari, MD, MSc (Sunnybrook Health Sciences Centre and University of Toronto, Toronto, Ontario, Canada); Edward J. Mills, PhD (British Columbia Centre for Excellence in HIV/AIDS, University of British Columbia, Vancouver, Canada); Femida Gwadry-Sridhar, PhD (University of Western Ontario, London, Canada); Haresh Kirpalani, MD, MSc (Department of Clinical Epidemiology and Biostatistics, McMaster University, and Children's Hospital Philadelphia, Philadelphia, Pennsylvania); Heloisa P. Soares, MD (Mount Sinai Medical Center, Miami Beach, Florida); Paul J. Karanicolas, MD, PhD (Memorial Sloan-Kettering Cancer Center, New York, New York); Karen E. A. Burns, MD, MSC (St. Michael's Hospital, Keenan Research Centre and Li Ka Shing Knowledge Institute, University of Toronto); Per Olav Vandvik, MD, PhD (Department of Medicine, Gjøvik, Innlandet Hospital Trust, Norway); Fernando Coto-Yglesias, MD (Hospital Nacional de Geriatría y Gerontología San José, Costa Rica); Pedro Paulo M. Chrispim, MSc (National School of Public Health, Rio de Janeiro); and Tim Ramsay, PhD (Ottawa Hospital Research Institute, University of Ottawa, Ottawa, Ontario, Canada).

Psaty BM, Rennie D. Stopping medical research to save money.  JAMA. 2003;289(16):2128-2131
PubMed   |  Link to Article
Wheatley K, Clayton D. Be skeptical about unexpected large apparent treatment effects.  Control Clin Trials. 2003;24(1):66-70
PubMed   |  Link to Article
Pocock S, White I. Trials stopped early: too good to be true?  Lancet. 1999;353(9157):943-944
PubMed   |  Link to Article
Schulz KF, Grimes DA. Multiplicity in randomised trials: subgroup and interim analyses.  Lancet. 2005;365(9471):1657-1661
PubMed   |  Link to Article
Pocock SJ, Hughes MD. Practical problems in interim analyses, with particular regard to estimation.  Control Clin Trials. 1989;10(4):(suppl)  209S-221S
PubMed   |  Link to Article
Montori VM, Devereaux PJ, Adhikari NK,  et al.  Randomized trials stopped early for benefit: a systematic review.  JAMA. 2005;294(17):2203-2209
PubMed   |  Link to Article
Bassler D, Montori VM, Briel M,  et al.  Early stopping of randomized clinical trials for overt efficacy is problematic.  J Clin Epidemiol. 2008;61(3):241-246
PubMed   |  Link to Article
Sydes MR, Parmar MK. Interim monitoring of efficacy data is important and appropriate.  J Clin Epidemiol. 2008;61(3):203-204
PubMed   |  Link to Article
Trotta F, Apolone G, Garattini S, Tafuri G. Stopping a trial early in oncology: for patients or for industry?  Ann Oncol. 2008;19(7):1347-1353
PubMed   |  Link to Article
Sargent D. Early stopping for benefit in National Cancer Institute–sponsored randomized Phase III trials.   J Clin Oncol. 2009;27(10):1543-1544
PubMed   |  Link to Article
Goodman SN. Stopping at nothing? some dilemmas of data monitoring in clinical trials.  Ann Intern Med. 2007;146(12):882-887
PubMed   |  Link to Article
Briel M, Lane M, Montori VM,  et al.  Stopping randomized trials early for benefit: a protocol of the Study Of Trial Policy Of Interim Truncation-2 (STOPIT-2).  Trials. 2009;10:49
PubMed   |  Link to Article
Pocock SJ. When (not) to stop a clinical trial for benefit.  JAMA. 2005;294(17):2228-2230
PubMed   |  Link to Article
Norman GR, Sloan JA, Wyrwich KW. Interpretation of changes in health-related quality of life.   Med Care. 2003;41(5):582-592
PubMed
Hopewell S, Loudon K, Clarke MJ, Oxman AD, Dickersin K. Publication bias in clinical trials due to statistical significance or direction of trial results.  Cochrane Database Syst Rev. 2009;(1):MR000006
PubMed
Korn EL, Freidlin B, Mooney M. Stopping or reporting early for positive results in randomized clinical trials.   J Clin Oncol. 2009;27(10):1712-1721
PubMed   |  Link to Article
Freidlin B, Korn EL. Stopping clinical trials early for benefit: impact on estimation.  Clin Trials. 2009;6(2):119-125
PubMed   |  Link to Article
Goodman SN. Stopping trials for efficacy.  Clin Trials. 2009;6(2):133-135
PubMed   |  Link to Article
DAMOCLES Study Group; NHS Health Technology Assessment Programme.  A proposed charter for clinical trial data monitoring committees.  Lancet. 2005;365(9460):711-722
PubMed
Pocock SJ. Current controversies in data monitoring for clinical trials.  Clin Trials. 2006;3(6):513-521
PubMed   |  Link to Article
Bassler D, Ferreira-Gonzalez I, Briel M,  et al.  Systematic reviewers neglect bias that results from trials stopped early for benefit.  J Clin Epidemiol. 2007;60(9):869-873
PubMed   |  Link to Article
Mueller PS, Montori VM, Bassler D, Koenig BA, Guyatt GH. Ethical issues in stopping randomized trials early because of apparent benefit.  Ann Intern Med. 2007;146(12):878-881
PubMed   |  Link to Article

Figures

Place holder to copy figure label and caption
Figure 1. Selection Process for Study Inclusion
Graphic Jump Location

RCT indicates randomized controlled trial; RR, relative risk.

Place holder to copy figure label and caption
Figure 2. Pooled Ratio of Relative Risks (RRs) and 95% Confidence Intervals (CIs) for Truncated vs Nontruncated Randomized Controlled Trials (RCTs)
Graphic Jump Location

First column indicates number associated with the question addressed by each review that included 1 or more truncated and matching nontruncated RCTs. Results ordered by P values associated with results of nontruncated RCTs; size of the data markers indicates weight of review questions in meta-analysis.

Place holder to copy figure label and caption
Figure 3. Weighted Bubble Plot Showing the Ratio of Relative Risks (RRs) vs the Total Number of Outcome Events in Truncated Randomized Controlled Trials (RCTs)
Graphic Jump Location

The size of each bubble is proportional to the magnitude of the inverse of the variance of the ratio of RR in the log scale. The dashed line indicates a ratio of RR of 0.71; the dotted line, a ratio of RR of 1.00. The shaded areas numbered 1 through 3 correspond to different degrees of overestimates of effect (ratios of RRs, 0.05-0.5; 0.5-0.75; 0.75-1.00): in area 1, very large overestimation (ratio of RR, 0.37; 95% confidence interval [CI], 0.31-0.44; P < .001) occurred in truncated trials with fewer than 200 events. In area 2, large overestimation (ratio of RR, 0.65; 95% CI, 0.56-0.77; P < .001) occurred in truncated trials stopped between 200 and 500 events. In area 3, truncated trials with more than 500 events led to moderate overestimation (ratio of RR, 0.88; 95% CI, 0.80-0.96; P = .003).

Tables

Table Graphic Jump LocationTable 1. Characteristics of Randomized Controlled Trials Stopped Early for Benefit and Those Not Stopped Early for Benefit Asking the Same Research Question
Table Graphic Jump LocationTable 2. Meta-regression Model Investigating the Predictors of the Log[Ratio of Relative Risks]a

References

Psaty BM, Rennie D. Stopping medical research to save money.  JAMA. 2003;289(16):2128-2131
PubMed   |  Link to Article
Wheatley K, Clayton D. Be skeptical about unexpected large apparent treatment effects.  Control Clin Trials. 2003;24(1):66-70
PubMed   |  Link to Article
Pocock S, White I. Trials stopped early: too good to be true?  Lancet. 1999;353(9157):943-944
PubMed   |  Link to Article
Schulz KF, Grimes DA. Multiplicity in randomised trials: subgroup and interim analyses.  Lancet. 2005;365(9471):1657-1661
PubMed   |  Link to Article
Pocock SJ, Hughes MD. Practical problems in interim analyses, with particular regard to estimation.  Control Clin Trials. 1989;10(4):(suppl)  209S-221S
PubMed   |  Link to Article
Montori VM, Devereaux PJ, Adhikari NK,  et al.  Randomized trials stopped early for benefit: a systematic review.  JAMA. 2005;294(17):2203-2209
PubMed   |  Link to Article
Bassler D, Montori VM, Briel M,  et al.  Early stopping of randomized clinical trials for overt efficacy is problematic.  J Clin Epidemiol. 2008;61(3):241-246
PubMed   |  Link to Article
Sydes MR, Parmar MK. Interim monitoring of efficacy data is important and appropriate.  J Clin Epidemiol. 2008;61(3):203-204
PubMed   |  Link to Article
Trotta F, Apolone G, Garattini S, Tafuri G. Stopping a trial early in oncology: for patients or for industry?  Ann Oncol. 2008;19(7):1347-1353
PubMed   |  Link to Article
Sargent D. Early stopping for benefit in National Cancer Institute–sponsored randomized Phase III trials.   J Clin Oncol. 2009;27(10):1543-1544
PubMed   |  Link to Article
Goodman SN. Stopping at nothing? some dilemmas of data monitoring in clinical trials.  Ann Intern Med. 2007;146(12):882-887
PubMed   |  Link to Article
Briel M, Lane M, Montori VM,  et al.  Stopping randomized trials early for benefit: a protocol of the Study Of Trial Policy Of Interim Truncation-2 (STOPIT-2).  Trials. 2009;10:49
PubMed   |  Link to Article
Pocock SJ. When (not) to stop a clinical trial for benefit.  JAMA. 2005;294(17):2228-2230
PubMed   |  Link to Article
Norman GR, Sloan JA, Wyrwich KW. Interpretation of changes in health-related quality of life.   Med Care. 2003;41(5):582-592
PubMed
Hopewell S, Loudon K, Clarke MJ, Oxman AD, Dickersin K. Publication bias in clinical trials due to statistical significance or direction of trial results.  Cochrane Database Syst Rev. 2009;(1):MR000006
PubMed
Korn EL, Freidlin B, Mooney M. Stopping or reporting early for positive results in randomized clinical trials.   J Clin Oncol. 2009;27(10):1712-1721
PubMed   |  Link to Article
Freidlin B, Korn EL. Stopping clinical trials early for benefit: impact on estimation.  Clin Trials. 2009;6(2):119-125
PubMed   |  Link to Article
Goodman SN. Stopping trials for efficacy.  Clin Trials. 2009;6(2):133-135
PubMed   |  Link to Article
DAMOCLES Study Group; NHS Health Technology Assessment Programme.  A proposed charter for clinical trial data monitoring committees.  Lancet. 2005;365(9460):711-722
PubMed
Pocock SJ. Current controversies in data monitoring for clinical trials.  Clin Trials. 2006;3(6):513-521
PubMed   |  Link to Article
Bassler D, Ferreira-Gonzalez I, Briel M,  et al.  Systematic reviewers neglect bias that results from trials stopped early for benefit.  J Clin Epidemiol. 2007;60(9):869-873
PubMed   |  Link to Article
Mueller PS, Montori VM, Bassler D, Koenig BA, Guyatt GH. Ethical issues in stopping randomized trials early because of apparent benefit.  Ann Intern Med. 2007;146(12):878-881
PubMed   |  Link to Article
CME
Meets CME requirements for:
Browse CME for all U.S. States
Accreditation Information
The American Medical Association is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians. The AMA designates this journal-based CME activity for a maximum of 1 AMA PRA Category 1 CreditTM per course. Physicians should claim only the credit commensurate with the extent of their participation in the activity. Physicians who complete the CME course and score at least 80% correct on the quiz are eligible for AMA PRA Category 1 CreditTM.
Note: You must get at least of the answers correct to pass this quiz.
You have not filled in all the answers to complete this quiz
The following questions were not answered:
Sorry, you have unsuccessfully completed this CME quiz with a score of
The following questions were not answered correctly:
Commitment to Change (optional):
Indicate what change(s) you will implement in your practice, if any, based on this CME course.
Your quiz results:
The filled radio buttons indicate your responses. The preferred responses are highlighted
For CME Course: A Proposed Model for Initial Assessment and Management of Acute Heart Failure Syndromes
Indicate what changes(s) you will implement in your practice, if any, based on this CME course.
NOTE:
Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s "Cited By" API will populate this tab (http://www.crossref.org/citedby.html).

Multimedia

Some tools below are only available to our subscribers or users with an online account.

Web of Science® Times Cited: 117

Related Content

Customize your page view by dragging & repositioning the boxes below.

Articles Related By Topic
Related Topics
PubMed Articles