0
Editorial |

When (Not) to Stop a Clinical Trial for Benefit

Stuart J. Pocock, PhD
[+] Author Affiliations

Author Affiliation: Medical Statistics Unit, London School of Hygiene and Tropical Medicine, London, England.

More Author Information
JAMA. 2005;294(17):2228-2230. doi:10.1001/jama.294.17.2228
Text Size: A A A
Published online

In this issue of JAMA, Montori and colleagues1 provide a valuable extensive and critical systemic review of clinical trials that were stopped early for benefit. Readers of the reports of such trials often feel a sense of excitement, especially when phrases such as “a major treatment advance,” “ethical need to stop the inferior treatment,” and “vital to tell the world immediately” are used. However, experience suggests that early results and enthusiasm, especially for modestly sized trials terminated early for apparent major benefit, are often moderated as subsequent reports arise.2

The skeptic should ask first whether correct and appropriate structures were in place for analyzing and reviewing, and making decisions based on, the trial’s accumulating interim data. Having the members of an effective independent data monitoring committee (DMC) or data and safety monitoring board as the only individuals accessing and interpreting interim data split by treatment group is now considered an essential part of good practice for major randomized trials.3 5 Still, a substantial minority of reported major trials appear not to have a DMC in place.6

Second, with or without a formal DMC recommendation, another question is whether the decision to stop a trial early and report the results was an appropriate judgment. This decision should be aided by a predefined statistical stopping boundary for a primary outcome,7 9 but some trials have no such guideline. It is important that such a boundary is sufficiently stringent (eg, very strong evidence of a treatment difference with a very small P value) to match the ethical and public health implications of a decision to stop the trial. In a spirit of requiring proof beyond reasonable doubt that a treatment difference is sufficient to affect future clinical practice, some lenient statistical boundaries are not a sensible choice in the direction of benefit. For instance, the so-called Pocock boundary9 and the O’Brien-Fleming boundary’s last interim look9 both typically require values around P = .02 for stopping, which is usually insufficient strength of evidence to stop a trial for benefit. Both boundaries can be made more appropriate if the overall type I error is set at 1% rather than the conventional 5%.

Many complex methods exist for statistical stopping boundaries, whereas in practice there is considerable merit in the simple Haybittle-Peto boundary,9 which requires P<.001 as evidence required to consider stopping a trial early for benefit. Even so, such a boundary should not be applied too soon, when few outcome events have been observed.

Decisions on early stopping (or not) need to be based on wise judgments interpreting the totality of available evidence, both in the current trial (considering primary and other efficacy outcomes and safety issues) and in other external evidence (especially from related trials).10 Accordingly, a statistical stopping boundary is only one useful objective component in an inevitably more challenging decision-making process. The ethical dilemma is to safeguard the interests of patients randomized in the current trial while also protecting society from overzealous premature claims of treatment benefit.11 For instance, if a trial is evaluating a treatment meant to be given long-term for conditions such as hypertension or chronic arthritis, short-term benefits, no matter how statistically significant, may not merit early stopping. If a trial is for regulatory approval, the sponsor and trialists should be encouraged not to stop early unless there is overwhelming evidence of treatment superiority, since the regulators require substantial evidence of both efficacy and safety, often in at least 2 trials reaching their intended full size and patient follow-up.

Montori et al1 rightly draw attention to some reports of trials that were stopped early but that did not document the planned size and circumstances of the relevant interim analysis and stopping boundary. Such deficiencies need correcting by authors, peer reviewers, and editors in line with CONSORT recommendations.12 Indeed, journals should consider rejecting the report of any trial potentially stopped prematurely and lacking adequate documentation, and access to trial protocols by journals would help in making this decision. There is probably less need to present adjusted analyses that attempt to correct for the interim monitoring and early stopping, since stopping depends on more than a statistical boundary, and complexities of adjustment can clutter the presentation of results and make interpretation of the findings more difficult. Real insight rests more on a full understanding of the circumstances at the time of stopping. Also, between the moment of making the decision to stop and locking the final database used for analysis and publication, substantial additional and corrected data may become available for analysis. Indeed, such data cleaning may justify a pause before any definite decision to stop the trial.

From a reader’s perspective, the key problem is whether to believe the treatment benefit is truly as great as the data imply. Montori et al1 appropriately emphasize that trials stopping early will tend to be on a “random high” of observed benefit, and if further data had been collected in either this or another trial, some “regression to the truth” to a more modest effect estimate would occur.2 ,13 These issues are more pronounced in smaller trials.

Montori et al reported a median of 66 events observed at the time trials were stopped. To achieve a difference between treatment that is significant at P<.001 requires a split by treatment group of at least 46 vs 20 events, which means that risk happens to be reduced by 57% or more. In most therapeutic areas, this is highly implausible and is often associated with relatively short patient follow-up time. Thus in many settings, trials should not stop so soon, because it is highly likely that the therapeutic claim is exaggerated.

The data monitoring experience in the CHARM program in 7599 patients with heart failure provides a thought-provoking example.14 At the fourth interim analysis with a median 1-year follow-up, there were 260 vs 339 deaths in the candesartan and placebo groups, respectively, a 24% risk reduction that crossed the P<.001 stopping boundary. For several documented reasons,14 the DMC voted to continue until the next interim analysis. The treatment mortality difference was then attenuated in subsequent interim analyses so that at the trial’s intended completion with a median of 3.1 years of follow-up, there were 886 deaths in the candesartan group vs 945 deaths in the placebo group, a 9% risk reduction (P = .055). Early stopping was resisted, and hence an exaggerated claim of survival benefit was avoided and important long-term benefits in other outcomes, such as cardiovascular death and heart failure hospitalization, were realized in each of the 3 component trials of the CHARM program.

So when is it appropriate to stop a trial early? The ASCOT factorial trial’s data monitoring experience provides useful insights.15 16 First, in 10305 patients with hypertension, the comparison of atorvastatin with placebo was halted when the difference in the primary end point, major coronary events, at interim analysis reached P<.001, the stopping boundary. With 100 vs 154 primary events in the atorvastatin and placebo groups, respectively, and a risk ratio of 0.64 (P = .0005), the published result was clear-cut.15 The appropriateness of stopping early was supported by other trials of statins in other populations and by important benefits in other outcomes, such as stroke.

A more difficult stopping decision arose in the ASCOT trial for the 19342 patients randomized to receive amlodipine-based and atenolol-based regimens. The predefined primary end point was major coronary events, whereas it is well known that the key effect of antihypertensive treatment is in reducing risk of stroke. Thus, when there emerged a highly significant reduction in stroke for amlodipine-based compared with atenolol-based treatment (P<.001), much debate ensued on whether to stop the trial, resulting in a decision to continue to the next interim analysis. Some months later, the trial was stopped early when there was also a significantly higher rate of mortality in the atenolol-based group, although still no significant difference existed for the primary end point. This example illustrates the complexities and tough decisions that can arise in data monitoring.17

Can a trial be stopped on the basis of secondary end points? Perhaps not, but on occasion, such as with the ASCOT-BPLA study, results of secondary end points (327 strokes with amlodipine vs 422 with atenolol, a 23% risk reduction [P = .0003]) provide convincing evidence of great public health importance.16 In lay terms, “when early results proved so promising it was no longer fair to keep patients on the older drugs for comparison, without giving them the opportunity to change.”18 However, the data in these 2 examples are more substantial compared with those in the majority of trials reviewed by Montori et al. The message is clear: most trials stopped early for benefit should not have been stopped at that point. Stopping for harm or futility is another matter19 that equally importantly requires future systematic review and comment. Inappropriate stopping of trials for commercial reasons raises additional serious concerns.20

In summary, all major randomized trials should have an independent DMC that functions effectively and makes wise judgments aided by stringent statistical stopping boundaries for benefit. It is critical that the DMC, principal investigators, executive committees, and sponsors all recognize the full public health implications of their recommendations and decisions.

AUTHOR INFORMATION

Corresponding Author: Stuart J. Pocock, PhD, Medical Statistics Unit, London School of Hygiene and Tropical Medicine, Keppel St, London WC1E 7HT, England (stuart.pocock@lshtm.ac.uk).

Financial Disclosures: None reported.

Editorials represent the opinions of the authors and JAMA and not those of the American Medical Association.

Montori VM, Devereaux PJ, Adhikari NKJ.  et al.  Randomized trials stopped early for benefit: a systematic review.  JAMA. 2005;2942203-2209
Ioannidis JP. Contradicted and initially stronger effects in highly cited clinical research.  JAMA. 2005;294218-228
PubMed
Ellenberg S, Fleming T, DeMets D. Data Monitoring Committees in Clinical Trials: A Practical PerspectiveChichester, England: John Wiley & Sons; 2002
 Draft guidance for clinical trial sponsors on the establishment and operation of clinical trial data monitoring committees, 66 Federal Register 58151-58153 (2001)
DAMOCLES Study Group.  A proposed charter for clinical trial data monitoring committees: helping them to do their job well.  Lancet. 2005;365711-722
PubMed
Sydes M, Altman DG, Babiker AB, Parmar M, Spiegelhalter DJ.DAMOCLES Study Group.  Reported use of data monitoring committees in the main published reports of randomised controlled trials: a cross-sectional study.  Clin Trials J. 2004;148-59
O’ Brien P. Data and safety monitoring. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics. Chichester, England: John Wiley & Sons; 1998:1058-1066
Fleming TR, Harrington DP, O’Brien PC. Designs for group sequential tests.  Control Clin Trials. 1984;5348-361
PubMed
Schulz KF, Grimes DA. Multiplicity in randomised trials, II: subgroup and interim analyses.  Lancet. 2005;3651657-1661
PubMed
Brocklehurst P, Elbourne D, Alfirevic A. The role of external evidence in monitoring clinical trials: reflections from a perinatal trial.  BMJ. 2000;320995-998
PubMed
Pocock SJ. When to stop a clinical trial.  BMJ. 1992;305235-240
PubMed
Moher D, Schulz KF, Altman DG.CONSORT Group.  The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials.  JAMA. 2001;2851987-1991
PubMed
Pocock S, White I. Trials stopped early: too good to be true?  Lancet. 1999;353943-944
PubMed
Pocock S, Wang D, Wilhelmsen L, Hennekens CH. The data monitoring experience in the Candarsartan in Heart failure Assessment of Reduction in Mortality and morbidity (CHARM) program.  Am Heart J. 2005;149939-943
PubMed
Sever P, Dahlof B, Poulter NR.  et al. ASCOT Investigators.  Prevention of coronary and stroke events with atorvastatin in hypertensive patients who have average or lower-than-average cholesterol concentrations, in the Anglo-Scandinavian Cardiac Outcomes Trial—Lipid Lowering Arm (ASCOT-LLA): a multicentre randomised controlled trial.  Lancet. 2003;3611149-1158
PubMed
Dahlöf B, Sever PS, Poulter NR.  et al. ASCOT Investigators.  Prevention of cardiovascular events with an antihypertensive regimen of amlodipine adding perindopril as required versus atenolol adding bendroflumethiazide as required, in the Anglo-Scandanavian Cardiac Outcomes Trial—Blood Pressure Lowering Arm (ASCOT-BPLA): a multicentre randomised controlled trial.  Lancet. 2005;366895-906
PubMed
DeMets DL, Furberg CD, Friedman L. Data Monitoring in Clinical Trials: A Case Studies ApproachHeidelberg, Germany: Springer; 2005
Hall C. Heart attacks may be cut by half. Daily Telegraph. September 5, 2005:1
DeMets DL, Pocock SJ, Julian DG. The agonising negative trend in monitoring of clinical trials.  Lancet. 1999;3541983-1988
PubMed
Psaty BM, Rennie D. Stopping medical research to save money: a broken pact with researchers and patients.  JAMA. 2003;2892128-2130
PubMed

First Page Preview

First page PDF preview

Figures

Tables

Interactive Graphics

Video

Country-Specific Mortality and Growth Failure in Infancy and Yound Children and Association With Material Stature

Use interactive graphics and maps to view and sort country-specific infant and early dhildhood mortality and growth failure data and their association with maternal

Montori VM, Devereaux PJ, Adhikari NKJ.  et al.  Randomized trials stopped early for benefit: a systematic review.  JAMA. 2005;2942203-2209
Ioannidis JP. Contradicted and initially stronger effects in highly cited clinical research.  JAMA. 2005;294218-228
PubMed
Ellenberg S, Fleming T, DeMets D. Data Monitoring Committees in Clinical Trials: A Practical PerspectiveChichester, England: John Wiley & Sons; 2002
 Draft guidance for clinical trial sponsors on the establishment and operation of clinical trial data monitoring committees, 66 Federal Register 58151-58153 (2001)
DAMOCLES Study Group.  A proposed charter for clinical trial data monitoring committees: helping them to do their job well.  Lancet. 2005;365711-722
PubMed
Sydes M, Altman DG, Babiker AB, Parmar M, Spiegelhalter DJ.DAMOCLES Study Group.  Reported use of data monitoring committees in the main published reports of randomised controlled trials: a cross-sectional study.  Clin Trials J. 2004;148-59
O’ Brien P. Data and safety monitoring. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics. Chichester, England: John Wiley & Sons; 1998:1058-1066
Fleming TR, Harrington DP, O’Brien PC. Designs for group sequential tests.  Control Clin Trials. 1984;5348-361
PubMed
Schulz KF, Grimes DA. Multiplicity in randomised trials, II: subgroup and interim analyses.  Lancet. 2005;3651657-1661
PubMed
Brocklehurst P, Elbourne D, Alfirevic A. The role of external evidence in monitoring clinical trials: reflections from a perinatal trial.  BMJ. 2000;320995-998
PubMed
Pocock SJ. When to stop a clinical trial.  BMJ. 1992;305235-240
PubMed
Moher D, Schulz KF, Altman DG.CONSORT Group.  The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials.  JAMA. 2001;2851987-1991
PubMed
Pocock S, White I. Trials stopped early: too good to be true?  Lancet. 1999;353943-944
PubMed
Pocock S, Wang D, Wilhelmsen L, Hennekens CH. The data monitoring experience in the Candarsartan in Heart failure Assessment of Reduction in Mortality and morbidity (CHARM) program.  Am Heart J. 2005;149939-943
PubMed
Sever P, Dahlof B, Poulter NR.  et al. ASCOT Investigators.  Prevention of coronary and stroke events with atorvastatin in hypertensive patients who have average or lower-than-average cholesterol concentrations, in the Anglo-Scandinavian Cardiac Outcomes Trial—Lipid Lowering Arm (ASCOT-LLA): a multicentre randomised controlled trial.  Lancet. 2003;3611149-1158
PubMed
Dahlöf B, Sever PS, Poulter NR.  et al. ASCOT Investigators.  Prevention of cardiovascular events with an antihypertensive regimen of amlodipine adding perindopril as required versus atenolol adding bendroflumethiazide as required, in the Anglo-Scandanavian Cardiac Outcomes Trial—Blood Pressure Lowering Arm (ASCOT-BPLA): a multicentre randomised controlled trial.  Lancet. 2005;366895-906
PubMed
DeMets DL, Furberg CD, Friedman L. Data Monitoring in Clinical Trials: A Case Studies ApproachHeidelberg, Germany: Springer; 2005
Hall C. Heart attacks may be cut by half. Daily Telegraph. September 5, 2005:1
DeMets DL, Pocock SJ, Julian DG. The agonising negative trend in monitoring of clinical trials.  Lancet. 1999;3541983-1988
PubMed
Psaty BM, Rennie D. Stopping medical research to save money: a broken pact with researchers and patients.  JAMA. 2003;2892128-2130
PubMed
CME Course for:


You need to register in order to view this quiz.


To understand the clinical management of acute heart failure syndromes.
Accreditation Information The American Medical Association is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians.
The AMA designates this journal-based CME activity for a maximum of 1 AMA PRA Category 1 CreditTM per course. Physicians should claim only the credit commensurate with the extent of their participation in the activity.
Physicians who complete the CME course and score at least 80% correct on the quiz are eligible for AMA PRA Category 1 CreditTM.
Note: You must get at least of the answers correct to pass this quiz.
Note: You must get at least of the answers correct to pass this quiz.
You have not filled in all the answers to complete this quiz
The following questions were not answered:
Sorry, you have unsuccessfully completed this CME quiz with a score of
The following questions were not answered correctly:
For CME Course: A Proposed Model for Initial Assessment and Management of Acute Heart Failure Syndromes
Indicate what changes(s) you will implement in your practice, if any, based on this CME course.
To view and print your certificate and access a summary of your CME courses go to My CME.
NOTE:
Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s “Cited By” API will populate this tab (http://www.crossref.org/citedby.html).
Submit a Comment

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Articles Related By Topic
Related Topics
PubMed Articles