Trends in Genetics
Genome scans and candidate gene approaches in the study of common diseases and variable drug responses
Section snippets
The identification and evaluation of tagging SNPs
Until recently, most association studies could be described as candidate polymorphism studies [1] in which only one or a few SNPs within a gene of interest were assessed for association with a phenotype. With the arrival of denser SNP maps and cheaper high-throughput genotyping techniques, such studies have given way to true candidate gene studies in which patterns of LD throughout the gene region are determined to (i) select efficient sets of tSNPs in an attempt to represent all the common
Blocks of linkage disequilibrium?
Although inspired in part by the idea of blocks of LD, the tagging approach does not require clear blocks of LD. Regardless of how discrete the pattern of LD is, or what causes this pattern, it is the predictability of the state of tagged SNPs by the haplotypes defined by the tSNPs that determines power in a genetic association study. Thus some of the controversies surrounding the definition and the underlying causes of LD blocks [8] are not relevant to the use of tSNPs. Indeed, Zhang et al. [9]
Genome-wide tags
Exhaustive strategies for finding functional variants that attempt to itemize all the SNPs that might be of functional significance (e.g. all coding SNPs, or all SNPs in and near a gene, as in Ref. [12]) are difficult to implement on a genome-wide scale. These strategies depend on a prior knowledge of the genomic location of all the important functional elements of the genome, which remains well out of reach. Hopefully, genome-wide LD scans will offer an efficient alternative, and two
Will the tSNPs represent non-identified variants?
The analyses above demonstrate that there is considerable redundancy among SNPs, and that the haplotype r2 criterion efficiently capitalizes on this redundancy to identify a minimal set of tSNPs. But the question remains whether selected tSNPs will adequately represent SNPs that have not yet been identified. As has been noted previously, if the original dataset is not sufficiently dense then tSNPs are unlikely to represent as yet non-identified SNPs, even if they do represent the SNPs already
Do tSNPs perform well in a different sample from the same population?
Another concern with the selection of tSNPs is whether the original population sample is sufficiently large to define tSNPs that will work well in another sample from the same population. This is of particular concern when large or low LD regions are analyzed, and haplotype frequencies are consequently low. To address this question, we constructed two bootstrap resamples of individuals for regions 8a and 9a. For each region, we then selected tags for one of the resampled populations, and
Why so few tSNPs?
The difference between our estimate and that of Gabriel et al. appears to depend primarily on the inclusion of long-range associations in our identification of the tags (Gabriel et al. had already noted such associations would reduce the required number of tSNPs). For example, consider selecting tags within each of three consecutive sub-regions (denoted sr1, sr2, sr3) separately, or for the three regions together (denoted R; Figure 3). Denote the number of SNPs required to tag R as SR, and the
Even-spaced SNPs
To evaluate the even-spaced strategy, we first used the same number of SNPs as required to tag a region, but we distributed the SNPs evenly through the region without regard to LD patterns. For five of the regions from the data of Gabriel et al., we evaluated the weighted haplotype r2 and the minimum haplotype r2 (http://popgen.biol.ucl.ac.uk/software.html) for these ‘even-spaced’ SNPs in predicting the allelic states of the other SNPs in the studied regions (Table 3). In all cases, the tSNPs
Candidate genes versus whole-genome scans
Despite these encouraging points for genome scanning, we recommend that association studies should generally begin with candidate genes and regions (e.g. linkage supported). Ignoring genotyping costs, scanning the genome as a whole still entails a high statistical price. For example, tagging a pathway relevant to a condition of interest (e.g. renin-angiotensin pathway and hypertension) might require ∼200 tSNPs, in comparison with >100 000 for the whole genome. Assume that there is one causal
Selecting candidate genes
Although candidate gene lists will rarely, if ever, include all relevant genes, it is also true that some parts of the genome are better places to start looking than others. Candidate genes are particularly obvious in the case of variable drug response, for example, genes encoding metabolizing enzymes and transporters, or targets and associated pathways. Indeed, one striking feature of pharmacogenetics is how often relatively ‘obvious’ candidate genes turn out to carry important variants 1, 16,
Health warnings for association studies
It should be accepted that association studies are inherently prone to inaccuracy and even abuse. Some of the problems are well documented, such as reporting biases, population stratification, and the need to correct for multiple comparisons across SNPs or haplotypes 20, 21, 22.
Other problems seem less well appreciated, such as choosing significance thresholds for multiple tests performed on partially correlated phenotypic data and the difficulties of ultimately making diagnostic use of markers
Conclusions
Ultimately, the importance of genetic association studies will depend on the variants that predispose to disease and/or influence drug response. At present, very few risk-conferring alleles for common diseases are known. It is likely that haplotype mapping will identify some new common variants of at least modest effect in addition to the ten that have been securely identified 20, 26, although it remains possible that these will represent a small fraction of the predisposing alleles for common
References (29)
- et al.
Linkage disequilibrium in humans: models and data
Am. J. Hum. Genet.
(2001) Haplotype block structure and its applications to association studies: power and study designs
Am. J. Hum. Genet.
(2002)- et al.
Population genomics: linkage disequilibrium holds the key
Curr. Biol.
(2001) Selection and evaluation of haplotype tagging SNPs in the neuronal-sodium-channel gene SCN1A: implications for linkage disequilibrium gene mapping
Am. J. Hum. Genet.
(2003)- et al.
Using haplotype blocks to map human complex trait loci
Trends Genet.
(2003) - et al.
Use of unlinked genetic markers to detect population stratification in association studies
Am. J. Hum. Genet.
(1999) Testing for population subdivision and association in four case-control studies
Am. J. Hum. Genet.
(2002)- et al.
Linkage disequilibrium and the mapping of complex human traits
Trends Genet.
(2002) Pharmacogenetics in the laboratory and the clinic
New Engl. J. Med.
(2003)Human genome hapmap launched with pledges of $100 million
Science
(2002)
Haplotype tagging for the identification of common disease genes
Nat. Genet.
High-resolution haplotypes structure in the human genome
Nat. Genet.
Islands of linkage disequilibrium
Nat. Genet.
Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans
Nat. Genet.
Cited by (140)
The Role of Exportin-5 in MicroRNA Biogenesis and Cancer
2018, Genomics, Proteomics and BioinformaticsGenome-wide association studies of suicidal behaviors: A review
2014, European NeuropsychopharmacologyCitation Excerpt :GWAS is performed using many genetic markers across the whole genome to analyze for association with a trait (Ikegawa, 2012). It has been termed as a hypothesis-free approach (with regard to that any of the markers/genes are being hypothesized to show association), which is in contrast to the candidate gene approach (Goldstein et al., 2003). One main goal of GWAS is to suggest novel candidate genes which could not be hypothesized a priori by current knowledge.
The role of non-HLA single nucleotide polymorphisms in multiple sclerosis susceptibility
2010, Journal of NeuroimmunologyCitation Excerpt :Although the results of these approaches are not gratifying, 20–30 genes outside of the HLA region are assumed to confer increased risk to develop MS (Bergsteinsdottir et al., 2000). However, Goldstein et al. (2003) recommended that association studies should generally begin with an evaluation of candidate genes and regions due to both genotyping cost and high statistical price of GWA studies. The power of association studies is affected by a range of factors, such as the sample size, effect size (odds ratio), allele frequency differences between marker and trait loci, and variation within the case or control populations leading to genetic heterogeneity (Cardon and Bell, 2001).
Variation in DRD2 dopamine gene predicts Extraverted personality
2010, Neuroscience LettersThe role of collagen type I α2 polymorphisms: intracranial aneurysms in Koreans
2009, Surgical Neurology