Principal component analysis for selection of optimal SNP-sets that capture intragenic genetic variation

被引:84
作者
Horne, BD
Camp, NJ
机构
[1] Univ Utah, Dept Med Informat, Genet Epidemiol Div, Salt Lake City, UT 84112 USA
[2] LDS Hosp, Cardiovasc Dept, Salt Lake City, UT USA
关键词
group-tagging SNP (gtSNP); haplotype-tagging SNP (htSNP); linkage disequilibrium; haplotype block; association analysis;
D O I
10.1002/gepi.10292
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Candidate gene association studies often utilize one single nucleotide polymorphism (SNP) for analysis, with an initial report typically not being replicated by subsequent studies. The failure to replicate may result from incomplete or poor identification of disease-related variants or haplotypes, possibly due to naive SNP selection. A method for identification of linkage disequilibrium (LD) groups and selection of SNPs that capture sufficient intra-genic genetic diversity is described. We assume all SNPs with minor allele frequency above a pre-determined frequency have been identified. Principal component analysis (PCA) is applied to evaluate multivariate SNP correlations to infer groups of SNPs in LD (LD-groups) and to establish an optimal set of group-tagging SNPs (gtSNPs) that provide the most comprehensive coverage of intragenic diversity while minimizing the resources necessary to perform an informative association analysis. This PCA method differs from haplotype block (HB) and haplotype-tagging SNP (htSNP) methods, in that an LD-group of SNPs need not be a contiguous DNA fragment. Results of the PCA method compared well with existing htSNP methods while also providing advantages over those methods, including an indication of the optimal number of SNPs needed. Further, evaluation of the method over multiple replicates of simulated data indicated PCA to be a robust method for SNP selection. Our findings suggest that PCA may be a powerful tool for establishing an optimal SNP set that maximizes the amount of genetic variation captured for a candidate gene using a minimal number of SNPs. Genet Epidemiol 26:11-21, 2004. (C) 2003 Wiley-Liss, Inc.
引用
收藏
页码:11 / 21
页数:11
相关论文
共 24 条
[1]   GAW12: Simulated genome scan, sequence, and family data for a common disease [J].
Almasy, L ;
Terwilliger, JD ;
Nielsen, D ;
Dyer, TD ;
Zaykin, D ;
Blangero, J .
GENETIC EPIDEMIOLOGY, 2001, 21 :S332-S338
[2]  
CLARK AG, 1990, MOL BIOL EVOL, V7, P111
[3]   Variations on a theme: Cataloging human DNA sequence variation [J].
Collins, FS ;
Guyer, MS ;
Chakravarti, A .
SCIENCE, 1997, 278 (5343) :1580-1581
[4]   A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data:: Application to HLA in type 1 diabetes [J].
Cordell, HJ ;
Clayton, DG .
AMERICAN JOURNAL OF HUMAN GENETICS, 2002, 70 (01) :124-141
[5]   An analysis of linkage disequilibrium in the interleukin-1 gene cluster, using a novel grouping method for multiallelic markers [J].
Cox, A ;
Camp, NJ ;
Nicklin, MJH ;
di Giovine, FS ;
Duff, GW .
AMERICAN JOURNAL OF HUMAN GENETICS, 1998, 62 (05) :1180-1188
[6]   High-resolution haplotype structure in the human genome [J].
Daly, MJ ;
Rioux, JD ;
Schaffner, SE ;
Hudson, TJ ;
Lander, ES .
NATURE GENETICS, 2001, 29 (02) :229-232
[7]  
EXCOFFIER L, 1995, MOL BIOL EVOL, V12, P921
[8]   The structure of haplotype blocks in the human genome [J].
Gabriel, SB ;
Schaffner, SF ;
Nguyen, H ;
Moore, JM ;
Roy, J ;
Blumenstiel, B ;
Higgins, J ;
DeFelice, M ;
Lochner, A ;
Faggart, M ;
Liu-Cordero, SN ;
Rotimi, C ;
Adeyemo, A ;
Cooper, R ;
Ward, R ;
Lander, ES ;
Daly, MJ ;
Altshuler, D .
SCIENCE, 2002, 296 (5576) :2225-2229
[9]   Haplotype tagging for the identification of common disease genes [J].
Johnson, GCL ;
Esposito, L ;
Barratt, BJ ;
Smith, AN ;
Heward, J ;
Di Genova, G ;
Ueda, H ;
Cordell, HJ ;
Eaves, IA ;
Dudbridge, F ;
Twells, RCJ ;
Payne, F ;
Hughes, W ;
Nutland, S ;
Stevens, H ;
Carr, P ;
Tuomilehto-Wolf, E ;
Tuomilehto, J ;
Gough, SCL ;
Clayton, DG ;
Todd, JA .
NATURE GENETICS, 2001, 29 (02) :233-237
[10]  
Johnson R. A., 1999, APPL MULTIVARIATE ST