We're unable to sign you in at this time. Please try again in a few minutes.
We were able to sign you in, but your subscription(s) could not be found. Please try again in a few minutes.
There may be a problem with your account. Please contact the AMA Service Center to resolve this issue.
Contact the AMA Service Center:
Telephone: 1 (800) 262-2350 or 1 (312) 670-7827  *   Email: subscriptions@jamanetwork.com
Error Message ......
Special Communication |

How to Interpret a Genome-wide Association Study

Thomas A. Pearson, MD, MPH, PhD; Teri A. Manolio, MD, PhD
JAMA. 2008;299(11):1335-1344. doi:10.1001/jama.299.11.1335.
Text Size: A A A
Published online

Genome-wide association (GWA) studies use high-throughput genotyping technologies to assay hundreds of thousands of single-nucleotide polymorphisms (SNPs) and relate them to clinical conditions and measurable traits. Since 2005, nearly 100 loci for as many as 40 common diseases and traits have been identified and replicated in GWA studies, many in genes not previously suspected of having a role in the disease under study, and some in genomic regions containing no known genes. GWA studies are an important advance in discovering genetic variants influencing disease but also have important limitations, including their potential for false-positive and false-negative results and for biases related to selection of study participants and genotyping errors. Although these studies are clearly many steps removed from actual clinical use, and specific applications of GWA findings in prevention and treatment are actively being pursued, at present these studies mainly represent a valuable discovery tool for examining genomic function and clarifying pathophysiologic mechanisms. This article describes the design, interpretation, application, and limitations of GWA studies for clinicians and scientists for whom this evolving science may have great relevance.

Figures in this Article

Sign in

Purchase Options

• Buy this article
• Subscribe to the journal
• Rent this article ?


Place holder to copy figure label and caption
Figure 1. Hypothetical Quantile-Quantile Plots in Genome-wide Association Studies
Graphic Jump Location

The Q-Q plot is used to assess the number and magnitude of observed associations between genotyped single-nucleotide polymorphisms (SNPs) and the disease or trait under study, compared to the association statistics expected under the null hypothesis of no association.39 Observed association statistics (eg, χ2 or t statistics) or −log10P values calculated from them, are ranked in order from smallest to largest on the y-axis and plotted against the distribution that would be expected under the null hypothesis of no association on the x-axis. Deviations from the identity line suggest either that the assumed distribution is incorrect or that the sample contains values arising in some other manner, as by a true association.39 A, Observed χ2 statistics of all polymorphic SNPs (dark blue) in a hypothetical genome-wide association study of a complex disease vs. the expected null distribution (black line). The sharp deviation above an expected χ2 value of approximately 8 could be due to a strong association of the disease with SNPs in a heavily genotyped region such as the major histocompatibility locus (MHC) on chromosome 6p21 in multiple sclerosis or rheumatoid arthritis.70 Exclusion of SNPs from such a locus may leave a residual upward deviation (light blue) identifying more associated SNPs with higher observed χ2 values (exceeding approximately 17) than expected under the null hypothesis. B, Observed (dark purple) vs expected (black line) χ2 statistics for a hypothetical genome-wide association study of a complex disease. Deviation from the expected distribution is observed above an expected χ2 of approximately 5. Inflation of observed statistics due to relatedness and potential population structure can be estimated by the method of genomic control.49 Correction for this inflation by simple division reduces the unadjusted χ2 statistics (dark purple) to the adjusted levels (light purple), showing deviation only above an expected χ2 of approximately 15. The region between expected χ2 of approximately 5 to approximately 15 is suggestive of broad differences in allele frequencies that are more likely due to population structure than disease susceptibility genes.

Place holder to copy figure label and caption
Figure 2. Associations in the IL23R Gene Region Identified by a Genome-wide Association Study of Inflammatory Bowel Disease
Graphic Jump Location

Genome-wide association studies frequently identify associations with many highly correlated single-nucleotide polymorphisms (SNPs) in a chromosomal region, due in part to linkage disequilibrium, among the SNPs. This can make it difficult to determine which SNP within a group is likely to be the causative or functional variant. A, Genomic locations of 2 genes, the interleukin 23 receptor (IL23R) and the interleukin 12 receptor, beta-2 (IL12Rb2), and a hypothetical protein, NM_001013674, between positions 62700000 and 67580000 of the short arm of chromosome 1 at region 1p31, are shown. B, The −log10P values for association with inflammatory bowel disease are plotted for each SNP genotyped in the region; those reaching a prespecified value of −log10 of 7 or greater are presumed to show association with disease. Several strong associations, at −log10P values or greater, are seen in the region just telomeric of position approximately 67400000 and extending just centromeric of position approximate 67450000. C, Pairwise linkage disequilibrium estimates between SNPs (measured as r2) are plotted for the region. Higher r2 values are indicated by darker shading. The region contains 4 “triangles” or “blocks” of linkage disequilibrium, 2 on either side of position 67400000 in the IL23R gene, another in the hypothetical protein telomeric of IL23R, and a fourth in the IL12RB2 gene at the centromeric end of the region. The 2 IL23R linkage disequilibrium regions each contain SNPs associated with inflammatory bowel disease, while the IL12RB2 region does not. Reproduced with permission from Duerr et al.53

Place holder to copy figure label and caption
Figure 3. Genome-wide Association Findings in Rheumatoid Arthritis
Graphic Jump Location

Genome-wide association studies assume a priori hypotheses about candidate genes or regions that might be associated with disease; rather, they test single-nucleotide polymorphisms (SNPs) throughout the genome for possible evidence of genetic susceptibility. Associations plotted as −log10P values for a genome-wide association study in 1522 cases with rheumatoid arthritis and 1850 controls, showing single data points for SNPs with P < 10−4 (lower horizontal red line) for 22 autosomes and the X chromosome. The predefined level of significance, at 5 × 10−8 is shown with a horizontal blue line. SNPs at PTPN22 on chromosome 1, the major histocompatibility comples (MHC) on chromosome 6, and the TRAF1-C5 locus on chromosome 9 exceed this threshold. Reproduced with permission from Plenge et al.47



Also Meets CME requirements for:
Browse CME for all U.S. States
Accreditation Information
The American Medical Association is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians. The AMA designates this journal-based CME activity for a maximum of 1 AMA PRA Category 1 CreditTM per course. Physicians should claim only the credit commensurate with the extent of their participation in the activity. Physicians who complete the CME course and score at least 80% correct on the quiz are eligible for AMA PRA Category 1 CreditTM.
Note: You must get at least of the answers correct to pass this quiz.
Please click the checkbox indicating that you have read the full article in order to submit your answers.
Your answers have been saved for later.
You have not filled in all the answers to complete this quiz
The following questions were not answered:
Sorry, you have unsuccessfully completed this CME quiz with a score of
The following questions were not answered correctly:
Commitment to Change (optional):
Indicate what change(s) you will implement in your practice, if any, based on this CME course.
Your quiz results:
The filled radio buttons indicate your responses. The preferred responses are highlighted
For CME Course: A Proposed Model for Initial Assessment and Management of Acute Heart Failure Syndromes
Indicate what changes(s) you will implement in your practice, if any, based on this CME course.


Some tools below are only available to our subscribers or users with an online account.

388 Citations

Sign in

Purchase Options

• Buy this article
• Subscribe to the journal
• Rent this article ?

Related Content

Customize your page view by dragging & repositioning the boxes below.

Articles Related By Topic
Related Collections
PubMed Articles