Genome-wide association (GWA) studies use high-throughput genotyping technologies to assay hundreds of thousands of single-nucleotide polymorphisms (SNPs) and relate them to clinical conditions and measurable traits. Since 2005, nearly 100 loci for as many as 40 common diseases and traits have been identified and replicated in GWA studies, many in genes not previously suspected of having a role in the disease under study, and some in genomic regions containing no known genes. GWA studies are an important advance in discovering genetic variants influencing disease but also have important limitations, including their potential for false-positive and false-negative results and for biases related to selection of study participants and genotyping errors. Although these studies are clearly many steps removed from actual clinical use, and specific applications of GWA findings in prevention and treatment are actively being pursued, at present these studies mainly represent a valuable discovery tool for examining genomic function and clarifying pathophysiologic mechanisms. This article describes the design, interpretation, application, and limitations of GWA studies for clinicians and scientists for whom this evolving science may have great relevance.
The Q-Q plot is used to assess the number and magnitude of observed associations between genotyped single-nucleotide polymorphisms (SNPs) and the disease or trait under study, compared to the association statistics expected under the null hypothesis of no association.39 Observed association statistics (eg, χ2 or t statistics) or −log10P values calculated from them, are ranked in order from smallest to largest on the y-axis and plotted against the distribution that would be expected under the null hypothesis of no association on the x-axis. Deviations from the identity line suggest either that the assumed distribution is incorrect or that the sample contains values arising in some other manner, as by a true association.39 A, Observed χ2 statistics of all polymorphic SNPs (dark blue) in a hypothetical genome-wide association study of a complex disease vs. the expected null distribution (black line). The sharp deviation above an expected χ2 value of approximately 8 could be due to a strong association of the disease with SNPs in a heavily genotyped region such as the major histocompatibility locus (MHC) on chromosome 6p21 in multiple sclerosis or rheumatoid arthritis.70 Exclusion of SNPs from such a locus may leave a residual upward deviation (light blue) identifying more associated SNPs with higher observed χ2 values (exceeding approximately 17) than expected under the null hypothesis. B, Observed (dark purple) vs expected (black line) χ2 statistics for a hypothetical genome-wide association study of a complex disease. Deviation from the expected distribution is observed above an expected χ2 of approximately 5. Inflation of observed statistics due to relatedness and potential population structure can be estimated by the method of genomic control.49 Correction for this inflation by simple division reduces the unadjusted χ2 statistics (dark purple) to the adjusted levels (light purple), showing deviation only above an expected χ2 of approximately 15. The region between expected χ2 of approximately 5 to approximately 15 is suggestive of broad differences in allele frequencies that are more likely due to population structure than disease susceptibility genes.
Genome-wide association studies frequently identify associations with many highly correlated single-nucleotide polymorphisms (SNPs) in a chromosomal region, due in part to linkage disequilibrium, among the SNPs. This can make it difficult to determine which SNP within a group is likely to be the causative or functional variant. A, Genomic locations of 2 genes, the interleukin 23 receptor (IL23R) and the interleukin 12 receptor, beta-2 (IL12Rb2), and a hypothetical protein, NM_001013674, between positions 62700000 and 67580000 of the short arm of chromosome 1 at region 1p31, are shown. B, The −log10P values for association with inflammatory bowel disease are plotted for each SNP genotyped in the region; those reaching a prespecified value of −log10 of 7 or greater are presumed to show association with disease. Several strong associations, at −log10P values or greater, are seen in the region just telomeric of position approximately 67400000 and extending just centromeric of position approximate 67450000. C, Pairwise linkage disequilibrium estimates between SNPs (measured as r2) are plotted for the region. Higher r2 values are indicated by darker shading. The region contains 4 “triangles” or “blocks” of linkage disequilibrium, 2 on either side of position 67400000 in the IL23R gene, another in the hypothetical protein telomeric of IL23R, and a fourth in the IL12RB2 gene at the centromeric end of the region. The 2 IL23R linkage disequilibrium regions each contain SNPs associated with inflammatory bowel disease, while the IL12RB2 region does not. Reproduced with permission from Duerr et al.53
Genome-wide association studies assume a priori hypotheses about candidate genes or regions that might be associated with disease; rather, they test single-nucleotide polymorphisms (SNPs) throughout the genome for possible evidence of genetic susceptibility. Associations plotted as −log10P values for a genome-wide association study in 1522 cases with rheumatoid arthritis and 1850 controls, showing single data points for SNPs with P < 10−4 (lower horizontal red line) for 22 autosomes and the X chromosome. The predefined level of significance, at 5 × 10−8 is shown with a horizontal blue line. SNPs at PTPN22 on chromosome 1, the major histocompatibility comples (MHC) on chromosome 6, and the TRAF1-C5 locus on chromosome 9 exceed this threshold. Reproduced with permission from Plenge et al.47
Some tools below are only available to our subscribers or users with an online account.
Download citation file:
Web of Science® Times Cited: 310
Customize your page view by dragging & repositioning the boxes below.
More Listings atJAMACareerCenter.com >
Genome-wide association (GWA) study
All results at
Enter your username and email address. We'll send you a link to reset your password.
Enter your username and email address. We'll send instructions on how to reset your password to the email address we have on record.
Athens and Shibboleth are access management services that provide single sign-on to protected resources. They replace the multiple user names and passwords necessary to access subscription-based content with a single user name and password that can be entered once per session. It operates independently of a user's location or IP address. If your institution uses Athens or Shibboleth authentication, please contact your site administrator to receive your user name and password.