UNROOTED GENEALOGICAL TREE PROBABILITIES IN THE INFINITELY-MANY-SITES MODEL

被引:76
作者
GRIFFITHS, RC [1 ]
TAVARE, S [1 ]
机构
[1] UNIV SO CALIF,DEPT MATH & BIOL SCI,LOS ANGELES,CA 90089
基金
美国国家科学基金会;
关键词
D O I
10.1016/0025-5564(94)00044-Z
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The infinitely-many-sites process is often used to model the sequence variability observed in samples of DNA sequences. Despite its popularity, the sampling theory of the process is rather poorly understood. We describe the tree structure underlying the model and show how this may be used to compute the probability of a sample of sequences. We show how to produce the unrooted genealogy from a set of sites in which the ancestral labeling is unknown and from this the corresponding rooted genealogies. We derive recursions for the probability of the configuration of sequences (equivalently, of trees) in both the rooted and unrooted cases. We give a computational method based on Monte Carlo recursion that provides approximants to sampling probabilities for samples of any size. Among several applications, this algorithm may be used to find maximum likelihood estimators of the substitution rate, both when the ancestral labeling of sites is known and when it is unknown.
引用
收藏
页码:77 / 98
页数:22
相关论文
共 11 条
[1]  
Ethier, Griffiths, The infinitely-many-sites model as a measure valued diffusion, The Annals of Probability, 15, pp. 414-545, (1987)
[2]  
Felsenstein, Numerical methods for inferring evolutionary trees, The Quarterly Review of Biology, 57, pp. 379-404, (1982)
[3]  
Griffiths, An Algorithm for Constructing Genealogical Trees, Statistics Research Report #163, (1987)
[4]  
Griffiths, Genealogical-tree probabilities in the infinitely-many-site model, J. Math. Biol., 27, pp. 667-680, (1989)
[5]  
Griffiths, Tavare, Sampling theory for neutral alleles in a varying environment, Phil. Trans. Roy. Soc. London B, 344, pp. 403-410, (1994)
[6]  
Griffiths, Tavare, Simulating probability distributions in the coalescent, Theor. Pop. Biol., 46, pp. 131-159, (1994)
[7]  
Gusfield, Efficient algorithms for inferring evolutionary trees, Networks, 21, pp. 19-28, (1991)
[8]  
Hudson, Gene genealogies and the coalescent process, Oxford Surveys in Evolutionary Biology, 7, pp. 1-44, (1991)
[9]  
Kingman, On the genealogy of large populations, Journal of Applied Probability, 19 A, pp. 27-43, (1982)
[10]  
Strobeck, Estimation of the neutral mutation rate in a finite population from DNA sequence data, Theor. Pop. Biol., 24, pp. 160-172, (1983)