0
Commentary |

Implications of the Principle of Question Propagation for Comparative-Effectiveness and “Data Mining” Research

Mia Djulbegovic; Benjamin Djulbegovic, MD, PhD
[+] Author Affiliations

Author Affiliations: University of Florida, Gainesville (Ms Djulbegovic); and Center for Evidence-based Medicine and Health Outcomes Research; Department of Medicine, University of South Florida; and Departments of Hematology and Health Outcomes and Behavior, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida (Dr Djulbegovic).


JAMA. 2011;305(3):298-299. doi:10.1001/jama.2010.2013
Text Size: A A A
Published online

Recent legislation incorporated comparative-effectiveness research (CER) as a scientific mechanism to help improve health care.1 The law expresses particular interest in discovering which treatments work “in a real world setting” and encourages conduct of observational studies using data mining techniques of standardized electronic records.1 2 Ideally, CER will identify effective interventions in the subgroup of patients, since traditional randomized trials typically provide efficacy data for an “average” patient only.1 2 It is likely that the amount of observational research will increase significantly, especially studies involving data mining of large administrative databases and electronic medical records. However, epistemological arguments suggest that data mining efforts cannot provide definitive answers to the questions asked by the CER program. Rather, CER should be considered hypothesis-generating research aiming to inform future prospective studies that will invariably require new (and better) data collection.

Science is an open-ended system. For every explanation science generates, a new ensuing explanation is required, because “there can be no explanation which is not in need of a further explanation.”3 This characteristic of scientific activity as open-ended instead of a finite, closed-end system was described by Rescher3 to be the consequence of Kant's Principle of Question Propagation as “ answering our factual (scientific) questions paves the way to further yet unanswered questions.” The essence of scientific inquiry is characterized by the cycle of answers generating new questions. The driving force behind this cycle of questions-answers-questions (Q-A-Q) is the attempt to improve scientific explanatory power, descriptive precision, and predictive accuracy.3 Consequently, science is characterized by escalating theoretical complexity accompanied by increasingly sophisticated scientific explanations. Scientific knowledge is purported knowledge—what counts as scientific knowledge is tentative and remains revisable as science progresses.3 4

Ever-Increasing Use of Technology to Arrive at Definitive Answers in Clinical Medicine

The capabilities of physicians to understand ailments, diagnose diseases, improve prognosis, and treat patients have dramatically improved during the past century, and especially during the past 25 years. These improvements have been achieved via systematic scientific activity, which has provided new answers to old questions, asked new questions, and replaced old knowledge with new.3 It is estimated that the amount of medical information is doubled every decade and that only approximately 22% to 50% of clinical research conclusions remain recognizably correct after 50 years.5

Despite the improved understandings of the mechanisms of diseases, medical knowledge will never be complete. Although new diagnostic tests, prognostic tools, and new treatments continue to be developed, and new clinical and research programs continue to discover new disease (eg, National Institutes of Health Program for Undiagnosed Diseases), biomedical scientific efforts will remain “inherently incompletable, with the ever-receding horizon, separating where we are from where we would ideally like to be.”3 In the ever-increasing desire to practice personalized medicine and to tailor the use of treatments or diagnostic devices to individual patients, clinicians and scientists seem to have no limit on the number of questions they can ask. Therefore, the number of possible Q-A-Q cycles is so large that the cycles should be seen as a permanent feature of research and practice of medicine. For example, it is estimated that in general practice alone, there are 48 000 key clinical questions that ought to be answered6 ; this is likely an underestimate.

Epistemological Feasibility of Conducting Health Research Using “Data Mining” Approach

The epistemological view that medicine is an open, incomplete system means that theory driven and hypothesis testing should continue to dominate scientific practice in medicine.7 Testing one or few key hypotheses at a time also represents the best way to make discoveries rapidly.8

To increase the rate of scientific discoveries, however, CER is shifting from the hypothesis-testing approach to the “data mining” scientific approach. In data mining, existing data are analyzed from different perspectives that may reveal new patterns or correlations in relational databases. This approach, which requires access to large data sets and massive investment in information technology (IT), has often been justified by the potential for new discoveries. Data mining has been strongly advocated for health services research.

The application of IT and analysis of the existing data sets have been promoted as one of the important aspects of CER (ie, the program that represents a scientific foundation of the ongoing health care reform in the United States).2 The premise is that by connecting different data sets (eg, patient outcome data, electronic medical records, administrative databases, genomic or proteomic data sets) using various data mining tools, new discoveries will emerge that had not been anticipated when databases were originally designed. An implied assumption is that the answer provided from such analyses will require no new data collection.

By definition, data mining presupposes the use of data that were originally designed and collected for different purposes. This is akin to the epistemological stand describing clinical medicine as a closed system within which data that had already been collected contain a desired answer. Given that medicine is an incomplete, open-ended system, the authors of the original database used for retrospective data mining could not possibly have anticipated or realized what new advances in medical science will be introduced and what kind of new discoveries will be made.3 This creates a paradox, which is particularly evident when searching for treatment effects in subgroups—one of the purported goals of the IT CER initiative. As new research generates new evidence of the importance for tailoring treatments to a given subpopulation of patients, the existing databases will need to be updated, in turn undermining the original purpose to discover new relationships via existing records. An example of the moving target nature of the open-endedness of scientific inquiry is shown in the Box. Consequently, the data mining approach can never result in credible discoveries that will obviate the need for new data collection. The best data mining research can hope to accomplish is to provide hypothesis-generating results, which will then need to be subjected to further scrutiny using the hypothesis-testing paradigm.

BOX. OPEN-ENDEDNESS OF SCIENTIFIC INQUIRY AND THE MOVING TARGET PROBLEM: AN EXAMPLE FROM THE CANCER REGISTRY

  • In 1956, American College of Surgeons established the cancer registry as a component of an approved cancer program with the main goal of better understanding health outcomes in cancer (eg, survival).

  • Because a number of factors affect survival, cancer registry softwares continue to be modified to allow capturing of new data items that reflect scientific progress and understanding of cancer.

  • For example, cancer registries started to collect data on cancer staging in the mid-1970s as the clinical stage came to be understood as one of the key factors affecting survival.

  • Accumulation of prognostic and predictive importance of estrogen/progesterone receptors on the course of breast cancer resulted in starting a collection of these data in the late 1980s.

  • During the last decade, management of breast cancer has also included obtaining information on HER2/neu receptors. However, the collection of these data were not mandated until 2010.

  • In 2007, 21-gene type signature was approved by the US Food and Drug Administration for management of early breast cancer. No plans to include these data in the cancer registry exist at this time despite its prospects to deliver personalized health care to patients with breast cancer.

  • In the meantime, scientists continue to discover new gene expression profiles that may further improve management of breast and other cancers, and clinicians and scientists continue to make new discoveries that are constantly translated into new health technologies. None of these genomic data are currently being collected in cancer registry databases.

As the nation embarks on a multibillion-dollar investment to develop new registries, data warehouses, and other standardized collections of data in an electronic format, it is important to heed some long-known principles in the philosophy of science.

Corresponding Author: Benjamin Djulbegovic, MD, PhD, University of South Florida Health, 12901 Bruce B. Downs Blvd, MDC27, Room 3127/3126, Tampa, FL 33612 (bdjulbeg@health.usf.edu).

Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest and none were reported.

Additional Contributions: We thank Sander Greenland, DrPH (University of California, Los Angeles), for helpful comments on an earlier version of the manuscript. We also thank Jane Carver, PhD, MS, MPH (Department of Pediatrics, University of South Florida, Tampa), for help with the revised version of the manuscript. Neither Dr Greenland nor Dr Carver received compensation for their contributions.

Commitee on Comparative Research Prioritization.  Institute of Medicine Initial National Priorities for Comparative Effectiveness Research. Washington, DC: National Academy Press; 2009
Sox HC. Defining comparative effectiveness research: the importance of getting it right.  Med Care. 2010;48(6):(suppl)  S7-S8
PubMedCrossRef
Rescher N. The Limits of Science. Pittsburgh, PA: University of Pittsburgh Press; 1999
Popper K. The Logic of Scientific Discovery. New York, NY: Harper & Row; 1959
LaValley MP, Felson DT. Truth survival.  Ann Intern Med. 2002;137(11):932
PubMed
Brassey J. Number of clinical questions. Trip database Web site. http://blog.tripdatabase.com/2007/01/number-of-clinical-questions.html. Posted January 25, 2007. Accessed December 20, 2010
Djulbegovic B, Guyatt GH, Ashcroft RE. Epistemologic inquiries in evidence-based medicine.  Cancer Control. 2009;16(2):158-168
PubMed
Platt JR. Strong inference: certain systematic methods of scientific thinking may produce much more rapid progress than others.  Science. 1964;146(3642):347-353
PubMedCrossRef

First Page Preview

First page PDF preview

Figures

Tables

Interactive Graphics

Video

Country-Specific Mortality and Growth Failure in Infancy and Yound Children and Association With Material Stature

Use interactive graphics and maps to view and sort country-specific infant and early dhildhood mortality and growth failure data and their association with maternal

Commitee on Comparative Research Prioritization.  Institute of Medicine Initial National Priorities for Comparative Effectiveness Research. Washington, DC: National Academy Press; 2009
Sox HC. Defining comparative effectiveness research: the importance of getting it right.  Med Care. 2010;48(6):(suppl)  S7-S8
PubMedCrossRef
Rescher N. The Limits of Science. Pittsburgh, PA: University of Pittsburgh Press; 1999
Popper K. The Logic of Scientific Discovery. New York, NY: Harper & Row; 1959
LaValley MP, Felson DT. Truth survival.  Ann Intern Med. 2002;137(11):932
PubMed
Brassey J. Number of clinical questions. Trip database Web site. http://blog.tripdatabase.com/2007/01/number-of-clinical-questions.html. Posted January 25, 2007. Accessed December 20, 2010
Djulbegovic B, Guyatt GH, Ashcroft RE. Epistemologic inquiries in evidence-based medicine.  Cancer Control. 2009;16(2):158-168
PubMed
Platt JR. Strong inference: certain systematic methods of scientific thinking may produce much more rapid progress than others.  Science. 1964;146(3642):347-353
PubMedCrossRef
CME Course for:


You need to register in order to view this quiz.


To understand the clinical management of acute heart failure syndromes.
Accreditation Information The American Medical Association is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians.
The AMA designates this journal-based CME activity for a maximum of 1 AMA PRA Category 1 CreditTM per course. Physicians should claim only the credit commensurate with the extent of their participation in the activity.
Physicians who complete the CME course and score at least 80% correct on the quiz are eligible for AMA PRA Category 1 CreditTM.
Note: You must get at least of the answers correct to pass this quiz.
Note: You must get at least of the answers correct to pass this quiz.
You have not filled in all the answers to complete this quiz
The following questions were not answered:
Sorry, you have unsuccessfully completed this CME quiz with a score of
The following questions were not answered correctly:
For CME Course: A Proposed Model for Initial Assessment and Management of Acute Heart Failure Syndromes
Indicate what changes(s) you will implement in your practice, if any, based on this CME course.
To view and print your certificate and access a summary of your CME courses go to My CME.
NOTE:
Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s “Cited By” API will populate this tab (http://www.crossref.org/citedby.html).
Submit a Response

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Articles Related By Topic
Related Topics
PubMed Articles
JAMAevidence.com