0
Commentary |

A New Era of Cardiovascular Disease Epidemiology

Bruce M. Psaty, MD, PhD; Donna Arnett, PhD; Gregory Burke, MD, MS
[+] Author Affiliations

Author Affiliations: Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology and Health Services, University of Washington and Center for Health Studies, Group Health, Seattle (Dr Psaty); Department of Epidemiology, University of Alabama at Birmingham, Birmingham (Dr Arnett); and Division of Public Health Sciences, Wake Forest University, Winston-Salem, North Carolina (Dr Burke).

More Author Information
JAMA. 2007;298(17):2060-2062. doi:10.1001/jama.298.17.2060
Text Size: A A A
Published online

Cardiovascular epidemiology has a rich, collaborative, and productive history. Beginning in 1948, the Framingham Heart Study was instrumental in identifying, for instance, high blood pressure and dyslipidemia as major risk factors for coronary heart disease and stroke.1 Subsequent clinical trials identified safe and effective treatments for these conditions. In the last several decades, the widespread use of medications for hypertension and dyslipidemia have prevented or delayed the onset of cardiovascular disease for millions of US residents. At the 60th anniversary of the Framingham study, a new approach to cardiovascular disease epidemiology is about to be tested.

In October 2007, data from approximately 9000 Framingham participants, not only extensive phenotype data but also 500 000 single nucleotide polymorphisms (SNPs) from a whole-genome scan on each individual, became accessible to the scientific community under the SHARe (SNP Health Association Resource) program.2 Scientists who meet standards that include approval by an institutional review board, human subjects training, computer security, and design and analysis plans will receive the Framingham genotype and phenotype data.3 - 4 These data will be available to all scientists, including the Framingham investigators, at the same time. Non-Framingham study scientists who receive data may begin analyses immediately, but they must agree to a 1-year moratorium before submitting manuscripts to journals. In recognition of the fundamental scientific incentive of publishing the first analysis of one's own data,5 the Framingham study investigators, many of whom spent years acquiring the phenotype data, are not bound by the moratorium and may submit manuscripts at any time after the release of the whole-genome data. SHARe represents a novel resource that allows the scientific community ready access to this large complex data set and, thus, seeks to enhance the breadth and scale of scientific output from this landmark epidemiological study.

In a similar program, the Candidate-gene Association Resource (CARe) Project will assay 55 000 SNPs in 2000 cardiovascular-candidate genes.6 Data-release procedures similar to those used in SHARe will make available to the scientific community both the CARe genotype and extensive phenotype data on about 50 000 participants in 9 National Heart, Lung, and Blood Institute (NHLBI)–funded cohorts in the spring of 2008. SHARe and CARe are indeed public resources. The expectation is that the widespread availability of these data to the scientific community at large will accelerate high-quality scientific findings in genetics.

The approach of making data widely available resembles the model used to assemble the human genome. Sequencing the human genome represented an astonishing scientific and technologic achievement that used a hierarchical shotgun sequencing strategy.7 DNA was fragmented, fragments were sequenced, and the sequenced parts were assembled. In the human genome project, sequenced fragments were placed immediately on the Web, and various groups worked to assist in their assembly. The genome data consist of 1 of 4 base pairs in a specific order; and the goal, which was computationally complex, was to deduce the proper order of many modest-sized strings of base pairs. The release of sequence data on the Web not only made publicly funded research data immediately available but also advanced the pace of describing the structure of the human genome.

The effort to generalize from the genome assembly of base-pair data to the epidemiological analyses of genotype-phenotype data represents a bold experiment. The release of the Framingham data to the scientific community takes advantage of quality-control efforts that were in place for data collection but abandons the quality-control efforts that were in place for data analysis and manuscript production in favor of widespread data access. The quantity and the quality of the science that emerges from this “genome” experiment remain to be seen.

Prior experience with similar efforts may provide some clues. All the major NHLBI cohort studies have already made limited access data sets (LADs) available to the scientific community.8 For instance, LADs for the Atherosclerosis Risk in Communities Study (ARIC)9 and for the Cardiovascular Health Study (CHS)10 have been available since April 2000. In the absence of support to conduct analyses, the LADs have been little used. To date, there have been 87 approved requests for data sets from ARIC and 56 for data sets from CHS. These public-release data sets have yielded 28 published manuscripts from ARIC and 21 from CHS (Sean Coady, MS, MA, National, Heart, Lung, and Blood Institute, written communication, September 11, 2007). In contrast, since January 2001, CHS investigators have published 323 articles, many of them by new or young investigators. Using complex phenotype data without seeking assistance from study investigators, some LADs authors have had difficulty correctly representing features of the studies or interpreting findings. A few LADs investigators, by using data beyond the scope of their applications, violated the terms of their original agreements. The NHLBI LADs program does not have the resources to monitor compliance with the agreements or the authority to enforce compliance when a violation comes to light.

For researchers unfamiliar with the design and conduct of a specific cohort study, epidemiological data will be readily accessible but may be difficult to use properly. The analysis of risk factors for a first myocardial infarction (MI) in CHS will serve to illustrate the problem. The CHS recruited participants whether or not they had had a previous MI.10 To assess the association with incident MI, investigators must exclude the 564 participants who had a prevalent MI at baseline. This exclusion of 9.6% of the cohort of 5888 participants is frequently overlooked by investigators unfamiliar with the details of the CHS design. Among those with prevalent MI at baseline, second MI events were intentionally not investigated or classified in CHS unless they were fatal events.11 If scientists want to study prognosis, the optimal design requires an inception cohort of patients at the time of their first MI and an effort to follow up participants forward in time. Indeed, the CHS was specifically designed to accommodate studies of both the incidence and prognosis of first MI. But investigators who are unfamiliar with these unique aspects of the CHS design frequently use data from all 5888 participants to conduct analyses of risk factors for first MI, and the results are biased, misleading, or incorrect. Editors and journal reviewers may have difficulty detecting some of these types of errors. Setting up the proper analysis is simply the first step in producing high-quality scientific findings from large complex studies, such as CHS, ARIC, or Framingham. Historically, quality control in data analysis and manuscript preparation has been provided by investigators who know the study nuances well.

If the analysis of epidemiological phenotypic data are complex, the addition of genetic data and the family structures in the Framingham study pose additional challenges. High-quality analyses, a hallmark of the Framingham publications, require expertise in genetics, genetic epidemiology, biostatistics, phenotypes, and study design. Complex analyses or an incomplete understanding of complex data and phenotypes may make the interpretation of findings a special challenge. In the absence of any coordination of efforts, multiple groups may expend much effort to become the first to report the same few findings. With hundreds of phenotypes and hundreds of thousands of genotypes, the prospect of false-positive findings looms large. Some scientists new to these epidemiological data may not appreciate the importance of reporting fully the numbers and types of statistical tests. If false-positive findings become the rationale for subsequent basic science investigation, a substantial amount of energy and resources may be wasted.

In addition to problems with data analysis and interpretation, there are some potential risks associated with the large-scale release of genotype-phenotype data. It only takes about 75 SNPs to uniquely identify an individual.12 With widespread dissemination of whole-genome forensic-equivalent genetic data that are linked to information about a variety of health conditions, risk factors, and diseases, either the loss or theft of personal computers on which these data are stored or the misuse of linked data by insurance companies, law-enforcement agencies, or other medical related industries may occasion public concern about genetic research. These issues emphasize the need for vigilance to ensure compliance with the procedures for data acquisition and use.

The outstanding achievements of recent whole-genome studies have depended importantly on replications that often include several stages and tens of thousands of individuals.13 Few genetic associations are so pronounced that a single study alone, such as the Framingham study, can provide convincing evidence. The most successful whole-genome studies have created cross-study collaborations that harmonize phenotypes, share early findings, link analyses to improve power, and seek additional populations for fine mapping and replication. In the immediate future, the release of the Framingham whole-genome data to the scientific community will benefit investigators with existing populations or whole-genome scans, including those from Iceland, the Netherlands, and the United Kingdom. No doubt important scientific advances will emerge from these efforts.

The long-term effects of “big science” on the field of epidemiology, its students, and its young investigators remain uncertain.14 - 16 All NHLBI cohort studies have required years of work by dozens of scientists and staff. The high-quality data collection that characterizes the Framingham study is difficult, expensive, and time-consuming. The temptation for young scientists to focus their careers on the use of these data will be huge. By a process resembling natural selection, adept users of these complex data sets may become the dominant epidemiological phenotype. If a tendency for trainees to focus solely on this analytic expertise comes at the expense of learning how to design studies and collect new phenotypic and environmental data for future questions of public health importance, the next generation of cardiovascular epidemiologists may be ill equipped for or uninterested in the creation of Framingham-like resources for their successors. For the analysis of genotype-phenotype associations in the future, large electronic medical-records databases, which are prone to various forms of measurement error, will be no substitute for high-quality standardized examinations of population-based samples.

Access to Framingham SHARe data will encourage analyses to address questions readily answerable by the existing data. If the questions are important, the use of existing data to address them is immensely efficient. But the mere existence of the data becomes a powerful incentive to conduct multiple less socially useful analyses that may divert scientific attention from more important public health findings. These analyses may, in effect, be driven by advances in technology rather than carefully considered, hypothesis-driven research questions that better address the needs of the health of the public.

The NHLBI has embarked on an experiment modeled on the success of the genome project. A number of proactive strategies may improve the scientific productivity associated with the genotypic-phenotypic data release from the Framingham Study. From one perspective, the vital Framingham resources are not simply the data but also the investigators. Their experience with and in-depth knowledge of the data are valuable assets for ensuring high-quality analyses. For many users new to the Framingham data, support to allow the Framingham investigators to serve as a collaborative resource would enhance the overall long-term quality of the effort.

The development of an effective network among investigators across institutions would help ensure the ongoing production of high-quality science throughout the analysis and interpretation phases of the conduct of this epidemiological experiment. Rapid advances may require the creation of meta-organizations across what are already large and complex studies. Such collaborations, some national and others international, will not only be able to identify, replicate, and verify findings rapidly but also be able to translate them into well-evaluated and useful interventions that improve the health of the public. In this setting, special efforts to train students and identify independent research opportunities for young investigators may be required. The NHLBI can use its convening power to develop and implement creative strategies that help fashion collaborations, advance the field, and ensure even more high-quality science.

Although open access to complex phenotypic and genotypic data holds great promise, the long-term contributions of this experiment to science and public health remain to be seen. The release of the Framingham data to the scientific community marks the beginning of a new era in cardiovascular disease epidemiology, in which computing power and large-scale unbiased analyses of genetic data may become more prominent than efforts at phenotyping, designing new studies, or developing and testing hypotheses. Ultimately, the pace of the translation of these scientific findings into practice and the improvements in the health of the public will be the key measures of the success of this bold experiment.

AUTHOR INFORMATION

Corresponding Author: Bruce M. Psaty, MD, PhD, Cardiovascular Health Research Unit, 1730 Minor Ave, Suite 1360, Seattle, WA 98101 (psaty@u.washington.edu).

Financial Disclosures: None reported.

Kannel WB, Dawber TR, Kagan A, Revotskie N, Stokes J III. Factors of risks in the development of coronary heart disease–six year follow-up experience: the Framingham Study.  Ann Intern Med. 1961;5533-50
PubMed
 Framingham SNP Health Association Resource (SHARe). National Center for Biotechonology Information. http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000007. Accessed September 21, 2007
 Instructions to apply for NHLBI authorized access datasets in dbGap: NHLBI Data Use Certification (DUC). http://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?view_pdf&stacc=phs000007.v1.p1. Accessed September 21, 2007
 dbGap System Security Plan (SSP) FAQ & Plan Template. http://www.ncbi.nlm.nih.gov/projects/gap/pdf/SSP_Template.pdf. Accessed September 21, 2007
 Sharing data from large-scale biological research projects: a system of tripartite responsibility. Report of a meeting organized by the Wellcome Trust; January 14-15, 2003; Fort Lauderdale, FL. http://www.wellcome.ac.uk/doc_wtd003208.html
 Candidate-gene Association Resource (CARe) home page. National Heart, Lung, and Blood Institute. http://www.broad.mit.edu/gen_analysis/care/index.php/Main_Page. Updated July 2, 2007. Accessed September 21, 2007
Lander ES, Linton LM, Birren B.  et al. International Human Genome Sequencing Consortium.  Initial sequencing and analysis of the human genome.  Nature. 2001;409(6822):860-921[published correction appears in Nature. 2001;412(6846):565 and Nature 2001;411(6838):720. Szustakowki, J (corrected to Szustakowski, J)]
PubMed
 The NHLBI Limited Access Dataset Program. National Heart, Lung, and Blood Institute. http://www.nhlbi.nih.gov/resources/deca/default.htm. Accessed September 23, 2007
The ARIC Investigators.  The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives.  Am J Epidemiol. 1989;129(4):687-702
PubMed
Fried LP, Borhani NO, Enright P.  et al.  The Cardiovascular Health Study: design and rationale.  Ann Epidemiol. 1991;1(3):263-276
PubMed
Ives DG, Fitzpatrick AL, Bild DE.  et al.  Surveillance and ascertainment of cardiovascular events: the Cardiovascular Health Study.  Ann Epidemiol. 1995;5(4):278-285
PubMed
Lin Z, Owen AB, Altman B. Genomic research and human subject privacy.  Science. 2004;305(5681):183
PubMed
Topol EJ, Murray SS, Frazer KA. The genomics gold rush.  JAMA. 2007;298(2):218-221
PubMed
Kaplan GA. How big is big enough for epidemiology?  Epidemiology. 2007;18(1):18-20
PubMed
Seminara D, Khoury MJ, O'Brien TR.  et al.  The emergence of networks in human genome epidemiology: challenges and opportunities.  Epidemiology. 2007;18(1):1-8
PubMed
Ness RB. “Big” science and the little guy.  Epidemiology. 2007;18(1):9-12
PubMed

First Page Preview

First page PDF preview

Figures

Tables

Interactive Graphics

Video

Country-Specific Mortality and Growth Failure in Infancy and Yound Children and Association With Material Stature

Use interactive graphics and maps to view and sort country-specific infant and early dhildhood mortality and growth failure data and their association with maternal

Kannel WB, Dawber TR, Kagan A, Revotskie N, Stokes J III. Factors of risks in the development of coronary heart disease–six year follow-up experience: the Framingham Study.  Ann Intern Med. 1961;5533-50
PubMed
 Framingham SNP Health Association Resource (SHARe). National Center for Biotechonology Information. http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000007. Accessed September 21, 2007
 Instructions to apply for NHLBI authorized access datasets in dbGap: NHLBI Data Use Certification (DUC). http://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?view_pdf&stacc=phs000007.v1.p1. Accessed September 21, 2007
 dbGap System Security Plan (SSP) FAQ & Plan Template. http://www.ncbi.nlm.nih.gov/projects/gap/pdf/SSP_Template.pdf. Accessed September 21, 2007
 Sharing data from large-scale biological research projects: a system of tripartite responsibility. Report of a meeting organized by the Wellcome Trust; January 14-15, 2003; Fort Lauderdale, FL. http://www.wellcome.ac.uk/doc_wtd003208.html
 Candidate-gene Association Resource (CARe) home page. National Heart, Lung, and Blood Institute. http://www.broad.mit.edu/gen_analysis/care/index.php/Main_Page. Updated July 2, 2007. Accessed September 21, 2007
Lander ES, Linton LM, Birren B.  et al. International Human Genome Sequencing Consortium.  Initial sequencing and analysis of the human genome.  Nature. 2001;409(6822):860-921[published correction appears in Nature. 2001;412(6846):565 and Nature 2001;411(6838):720. Szustakowki, J (corrected to Szustakowski, J)]
PubMed
 The NHLBI Limited Access Dataset Program. National Heart, Lung, and Blood Institute. http://www.nhlbi.nih.gov/resources/deca/default.htm. Accessed September 23, 2007
The ARIC Investigators.  The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives.  Am J Epidemiol. 1989;129(4):687-702
PubMed
Fried LP, Borhani NO, Enright P.  et al.  The Cardiovascular Health Study: design and rationale.  Ann Epidemiol. 1991;1(3):263-276
PubMed
Ives DG, Fitzpatrick AL, Bild DE.  et al.  Surveillance and ascertainment of cardiovascular events: the Cardiovascular Health Study.  Ann Epidemiol. 1995;5(4):278-285
PubMed
Lin Z, Owen AB, Altman B. Genomic research and human subject privacy.  Science. 2004;305(5681):183
PubMed
Topol EJ, Murray SS, Frazer KA. The genomics gold rush.  JAMA. 2007;298(2):218-221
PubMed
Kaplan GA. How big is big enough for epidemiology?  Epidemiology. 2007;18(1):18-20
PubMed
Seminara D, Khoury MJ, O'Brien TR.  et al.  The emergence of networks in human genome epidemiology: challenges and opportunities.  Epidemiology. 2007;18(1):1-8
PubMed
Ness RB. “Big” science and the little guy.  Epidemiology. 2007;18(1):9-12
PubMed
CME Course for:


You need to register in order to view this quiz.


To understand the clinical management of acute heart failure syndromes.
Accreditation Information The American Medical Association is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians.
The AMA designates this journal-based CME activity for a maximum of 1 AMA PRA Category 1 CreditTM per course. Physicians should claim only the credit commensurate with the extent of their participation in the activity.
Physicians who complete the CME course and score at least 80% correct on the quiz are eligible for AMA PRA Category 1 CreditTM.
Note: You must get at least of the answers correct to pass this quiz.
Note: You must get at least of the answers correct to pass this quiz.
You have not filled in all the answers to complete this quiz
The following questions were not answered:
Sorry, you have unsuccessfully completed this CME quiz with a score of
The following questions were not answered correctly:
For CME Course: A Proposed Model for Initial Assessment and Management of Acute Heart Failure Syndromes
Indicate what changes(s) you will implement in your practice, if any, based on this CME course.
To view and print your certificate and access a summary of your CME courses go to My CME.
NOTE:
Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s “Cited By” API will populate this tab (http://www.crossref.org/citedby.html).
Submit a Response

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Articles Related By Topic
Related Topics
PubMed Articles