GERBIL: Genotype resolution and block identification using likelihood

被引:100
作者
Kimmel, G [1 ]
Shamir, R [1 ]
机构
[1] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel
关键词
haplotype; algorithm; phasing; single-nucleotide polymorphism; expectation maximization;
D O I
10.1073/pnas.0404730102
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The abundance of genotype data generated by individual and international efforts carries the promise of revolutionizing disease studies and the association of phenotypes with individual polymorphisms. A key challenge is providing an accurate resolution (phasing) of the genotypes into haplotypes. We present here results on a method for genotype phasing in the presence of recombination. Our analysis is based on a stochastic model for recombination-poor regions ("blocks"), in which haplotypes are generated from a small number of core haplotypes, allowing for mutations, rare recombinations, and errors. We formulate genotype resolution and block partitioning as a maximum-likelihood problem and solve it by an expectation-maximization algorithm. The algorithm was implemented in a software package called GERBIL (genotype resolution and block identification using likelihood), which is efficient and simple to use. We tested GERBIL on four large-scale sets of genotypes. It outperformed two state-of-the-art phasing algorithms. The PHASE algorithm was slightly more accurate than GERBIL when allowed to run with default parameters, but required two orders of magnitude more time. When using comparable running times, GERBIL was consistently more accurate. For data sets with hundreds of genotypes, the time required by PHASE becomes prohibitive. We conclude that GERBIL has a clear advantage for studies that include many hundreds of genotypes and, in particular, for large-scale disease studies.
引用
收藏
页码:158 / 162
页数:5
相关论文
共 30 条
[1]  
[Anonymous], P 7 ANN INT C RES CO, DOI DOI 10.1145/640075.640119
[2]  
[Anonymous], RECOMB ANN INT C RES
[3]   Haplotyping as perfect phylogeny: A direct approach [J].
Bafna, V ;
Gusfield, D ;
Lancia, G ;
Yooseph, S .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (3-4) :323-340
[4]  
CLARK AG, 1990, MOL BIOL EVOL, V7, P111
[5]   High-resolution haplotype structure in the human genome [J].
Daly, MJ ;
Rioux, JD ;
Schaffner, SE ;
Hudson, TJ ;
Lander, ES .
NATURE GENETICS, 2001, 29 (02) :229-232
[6]   A COMPARISON OF LINKAGE DISEQUILIBRIUM MEASURES FOR FINE-SCALE MAPPING [J].
DEVLIN, B ;
RISCH, N .
GENOMICS, 1995, 29 (02) :311-322
[7]  
Excoffier L. S. M., 1995, MOL BIOL EVOL, V12, P912
[8]   The structure of haplotype blocks in the human genome [J].
Gabriel, SB ;
Schaffner, SF ;
Nguyen, H ;
Moore, JM ;
Roy, J ;
Blumenstiel, B ;
Higgins, J ;
DeFelice, M ;
Lochner, A ;
Faggart, M ;
Liu-Cordero, SN ;
Rotimi, C ;
Adeyemo, A ;
Cooper, R ;
Ward, R ;
Lander, ES ;
Daly, MJ ;
Altshuler, D .
SCIENCE, 2002, 296 (5576) :2225-2229
[9]  
GREENSPAN G, 2003, P 7 ANN INT C RES CO, P131, DOI DOI 10.1145/640075.640092
[10]  
Gusfield D., 2002, PROC 6 ANN INT C COM, P166, DOI DOI 10.1145/565196.565218.