Author Affiliation: Department of Pathology, University of Vermont College of Medicine, Burlington.
As a result of rapid technological progress in single-nucleotide polymorphism (SNP) genotyping and the availability of appropriate large-scale epidemiological studies and clinical trials, an increasing number of genome-wide association and gene-centric genotyping studies of complex multigenic diseases, such as venous thrombosis and atherosclerosis, are appearing in the literature.1 - 2 Genome-wide association studies take advantage of the fact that SNPs occur in approximately one per thousand base pairs and are common in the population (frequency ≥1%). Thus, tools such as the recently available 1 million–SNP chip make it possible to study common genetic variance in coding and noncoding regions across the genome. In contrast, gene-centric genotyping usually focuses on genes likely to be informative and uses specific SNPs that mark functional changes in coding, are in nearby regulatory regions, or are representative of other SNPs residing on local haplotypes. “Tag-SNPs” representative of human haplotypes can be identified from the human HapMap.3
A common gene-centric SNP genotyping strategy is to select a small number of candidate genes based on plausible biological roles in the disease of interest. An example of this approach was the study by Smith et al4 that used 280 SNPs located in 24 venous thrombosis candidate genes and identified 5 SNPs associated with thrombosis risk, including 3 SNPs that were previously unreported.
An interesting variation on this approach is the study by Bezemer et al5 reported in this issue of JAMA. The authors cast a broad net examining 19 682 SNPs in 10 887 genes thereby including nearly half the known genes in the human genome. The authors selected SNPs based on their potential to affect gene function or expression, and on average, analyzed fewer than 2 SNPs per gene. The study populations comprised participants in 2 well-designed case-control studies of deep venous thrombosis, the Leiden Thrombophilia Study (LETS)6 and the Multiple Environmental and Genetic Assessment of Risk Factors for Venous Thrombosis study (MEGA).7 With the potential for so many statistical tests, false-positive results are a major issue with this study design.8 Using a variation on the study design of Shiffman et al,9 Bezemer et al5 combined gene discovery with replication and adjusted for the false discovery rate due to multiple testing.
The 3 SNPs with the strongest association with venous thrombosis, by the authors' most conservative analysis, were in the genes for antithrombin (SERPINC1), the platelet collagen receptor (GP6), and a gene in the cytochrome P450 family 4 (CYP4V2). All 3 were common in the population studied, with allele frequencies ranging from 0.10 to 0.84 and relatively weak additive odds ratios ranging from 1.15 to 1.29. Antithrombin and the platelet collagen receptor are involved in hemostatic pathways and lend themselves to mechanistic hypothesis testing. The CYP4V2 gene is not an obvious candidate for thrombosis risk and either may be an indicator of a novel pathophysiologic pathway leading to thrombosis or may be genetically linked to a nearby causal gene as suggested below.
By relaxing their criteria for false discovery rate, the authors identified 4 additional genes possibly associated with venous thrombosis, including the coagulation factor IX gene. They also evaluated additional SNPs near CYP4V2 in a substudy of the LETS and MEGA-1 populations in which they identified 2 more genes in the coagulation pathway: prekallikrein and factor XI. Of these 6 additionally identified genes, 3 are involved in hemostatic pathways and all had weak additive odds ratios. High plasma concentrations of 2 of them, factors IX and XI, have been identified as risk factors for venous thrombosis.10 - 12 Because the evidence for the significance of these additional 6 risk alleles is considerably weaker than for those of the 3 alleles with the strongest association, these observations need further validation.
So what are the take-home messages from this work? From the pragmatic perspective of clinical practice, it is reasonable to ask, of what use are risk factors with weak odds ratios? The answer comes in the form of a well-known metric, the population-attributable risk percentage, which is the proportion of the outcome that can be attributed to the risk marker, assuming the risk marker is in the causal pathway. Even a small relative risk can be associated with a large population-attributable risk percentage if the risk marker occurs in a large proportion of the population. From that perspective, the important observations in the study by Bezemer et al5 are the high prevalence of the risk alleles and evidence of genetic dosage, with higher odds ratios for thrombosis in homozygotes vs heterozygotes. Using the example of homozygosity for GP6 (Table 2 in the article),5 the prevalence of GP6 in the control group was 68% with an odds ratio of 1.46, which translates to a 46% increased odds of homozygotes. Calculation of the population-attributable risk percentage shows that one-quarter of thrombotic events in this population would be explained by homozygosity of the risk allele. To put this into the context of clinical testing for venous thrombosis risk factors, the attributable risk associated with GP6 is similar to that for factor V Leiden (one of the most commonly ordered genetic tests) and much greater than for protein C, protein S, or antithrombin. Needless to say, newly identified risk factors such as GP6 must be validated in well-designed clinical studies to define their clinical utility.
When discussing population-attributable risk percentage it is important to note that proportions of disease attributable to various component causes in multigenic diseases like venous thrombosis13 - 16 may sum to more than 100% because risk factors are not independent. However, because two-thirds of the population is homozygous for the risk allele in GP6, there is ample opportunity for interactions among genetic and acquired risk factors. Of course the goal of clinical practice is to define individual risk as opposed to attributable risk in populations. Thus, the promise of studies like those of Bezemer et al5 and Smith et al4 is that ultimately risk profiles with significant predictive value can be constructed to guide practice.
What future research directions do these results indicate? The wide net cast by these investigators had a rather coarse mesh with less than 2 SNPs per gene. The fact that the investigators have identified a number of interesting risk factors, some with apparent biological plausibility, suggests that this is a fruitful approach and that the addition of more informative SNPs per gene would most likely have a high likelihood of identifying additional important risk factors for thrombosis. However, the success of this approach is in the well-characterized phenotypes available in studies like LETS and MEGA. Put another way, the output of the powerful genotyping resources available to investigators is only as good as the input.17
Moreover, only 24% of the SNPs deployed in the study by Bezemer et al5 were targeted to regulatory regions in transcription-factor binding sites or untranslated regions of messenger RNA. The importance of the regulatory genome is emphasized by the recent Encyclopedia of DNA Elements (ENCODE) study18 that characterized in detail transcriptional activity in 1% of the human genome. Only about 2% of this 1% comprised protein-coding genes. One of the major findings of ENCODE is that large tracts of the non–protein-coding human genome, previously thought to be transcriptionally silent or “junk” DNA,19 are pervasively transcribed with non–protein-coding transcripts.
The significance of most of these non–protein-coding transcripts is unknown; however, it appears that there is dispersed regulation spread throughout the genome with many regulatory sites for specific genes located at great distances from the gene.20 An example of noncoding transcripts is the rapidly expanding family of small noncoding regulatory RNAs, which includes small interfering RNA (si-RNA, 20-25 nucleotides), micro RNA (mi-RNA, 20-25 nucleotides), Piwi Argonaute protein–associated RNA (pi-RNA, 25-30 nucleotides), and a group of longer noncoding RNAs (≥70 nucleotides).21 A recent genome-wide association study22 of coronary artery disease, replicated in 6 independent populations, identified risk alleles at chromosome 9p21 on a 58-Kb haplotype devoid of known genes but with evidence for noncoding transciption. The authors speculate that the variants may be involved in gene regulation involving noncoding RNA. Studies like this in light of the ENCODE observations suggest that future genotyping strategies may include a stronger focus on the intergenic as well as the intragenic genome.
Corresponding Author: Edwin G. Bovill, MD, Department of Pathology, University of Vermont College of Medicine, Burlington, VT 05482 (edwin.bovill@uvm.edu).
Financial Disclosures: None reported.
Editorials represent the opinions of the authors and JAMA and not those of the American Medical Association.
Country-Specific Mortality and Growth Failure in Infancy and Yound Children and Association With Material Stature
Use interactive graphics and maps to view and sort country-specific infant and early dhildhood mortality and growth failure data and their association with maternal
Instructions
Comments are moderated and will appear on the site at the discretion of the Journal of American Medical Association editors. Comments should not exceed 500 words of text and 10 references.
Do not submit personal medical questions or information that could identify a specific patient, questions about a particular case, or general inquiries to an author. Only content that has not been published, posted, or submitted elsewhere should be submitted. By submitting this Comment, you and any coauthors transfer copyright to the journal if your Comment is posted.
* = Required Field
Disclosure of Any Conflicts of Interest* Indicate all relevant conflicts of interest of each author below, including all relevant financial interests, activities, and relationships within the past 3 years including, but not limited to, employment, affiliation, grants or funding, consultancies, honoraria or payment, speakers’ bureaus, stock ownership or options, expert testimony, royalties, donation of medical equipment, or patents planned, pending, or issued. If all authors have none, check "No potential conflicts or relevant financial interests" in the box below. Please also indicate any funding received in support of this work. The information will be posted with your response.
Register and get free email Table of Contents alerts, saved searches, PowerPoint downloads, CME quizzes, and more
Subscribe for full-text access to content from 1998 forward and a host of useful features
Activate your current subscription (AMA members and current subscribers)
Some tools below are only available to our subscribers or users with an online account.
Download citation file:
Web of Science® Times Cited: 3
Customize your page view by dragging & repositioning the boxes below.
The Rational Clinical Examination Table 43-1 Risk Factors for Venous Thromboembolism
The Rational Clinical Examination Venous thrombosis occurs in 1 to 2 persons per 1000 person-years, with approximately one-half to...
All results at JAMAevidence.com >
and access these and other features:
Register Now
Enter your username and email address. We'll send you a reminder to the email address on record.
Athens and Shibboleth are access management services that provide single sign-on to protected resources. They replace the multiple user names and passwords necessary to access subscription-based content with a single user name and password that can be entered once per session. It operates independently of a user's location or IP address. If your institution uses Athens or Shibboleth authentication, please contact your site administrator to receive your user name and password.