Aligning multiple genomic sequences with the threaded blockset aligner

被引:1092
作者
Blanchette, M
Kent, WJ
Riemer, C
Elnitski, L
Smit, AFA
Roskin, KM
Baertsch, R
Rosenbloom, K
Clawson, H
Green, ED
Haussler, D
Miller, W [1 ]
机构
[1] Penn State Univ, Ctr Comparat Genom & Bioinformat, University Pk, PA 16802 USA
[2] Univ Calif Santa Cruz, Howard Hughes Med Inst, Santa Cruz, CA 95064 USA
[3] Univ Calif Santa Cruz, Ctr Biomol Sci & Engn, Santa Cruz, CA 95064 USA
[4] Inst Syst Biol, Seattle, WA 98103 USA
[5] NHGRI, Genome Technol Branch, NIH, Bethesda, MD 20892 USA
[6] NHGRI, NIH Intramural Sequencing Ctr, NIH, Bethesda, MD 20892 USA
关键词
D O I
10.1101/gr.1933104
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We define a "threaded blockset," which is a novel generalization of the classic notion of a multiple alignment. A new computer program called TBA (for "threaded blockset aligner") builds a threaded blockset under the assumption that all matching segments occur in the same order and orientation in the given sequences; inversions and duplications are not addressed. TBA is designed to be appropriate for aligning many, but by no means all, megabase-sized regions of multiple mammalian genomes. The output of TBA can be projected onto any genome chosen as a reference, thus guaranteeing that different projections present consistent predictions of which genomic positions are orthologous. This capability is illustrated using a new visualization tool to view TBA-generated alignments of vertebrate Hox clusters from both the mammalian and fish perspectives. Experimental evaluation of alignment quality, using a program that simulates evolutionary change in genomic sequences, indicates that TBA is more accurate than earlier programs. To perform the dynamic-programming alignment step, TBA runs a stand-alone program called MULTIZ, which can be used to align highly rearranged or incompletely sequenced genomes. We describe our use of MULTIZ to produce the whole-genome multiple alignments at the Santa Cruz Genome Browser.
引用
收藏
页码:708 / 715
页数:8
相关论文
共 26 条
[1]   Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes [J].
Aparicio, S ;
Chapman, J ;
Stupka, E ;
Putnam, N ;
Chia, J ;
Dehal, P ;
Christoffels, A ;
Rash, S ;
Hoon, S ;
Smit, A ;
Gelpke, MDS ;
Roach, J ;
Oh, T ;
Ho, IY ;
Wong, M ;
Detter, C ;
Verhoef, F ;
Predki, P ;
Tay, A ;
Lucas, S ;
Richardson, P ;
Smith, SF ;
Clark, MS ;
Edwards, YJK ;
Doggett, N ;
Zharkikh, A ;
Tavtigian, SV ;
Pruss, D ;
Barnstead, M ;
Evans, C ;
Baden, H ;
Powell, J ;
Glusman, G ;
Rowen, L ;
Hood, L ;
Tan, YH ;
Elgar, G ;
Hawkins, T ;
Venkatesh, B ;
Rokhsar, D ;
Brenner, S .
SCIENCE, 2002, 297 (5585) :1301-1310
[2]   MAVID multiple alignment server [J].
Bray, N ;
Pachter, L .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3525-3526
[3]   LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA [J].
Brudno, M ;
Do, CB ;
Cooper, GM ;
Kim, MF ;
Davydov, E ;
Green, ED ;
Sidow, A ;
Batzoglou, S .
GENOME RESEARCH, 2003, 13 (04) :721-731
[4]   Fast and sensitive alignment of large genomic sequences [J].
Brudno, M ;
Morgenstern, B .
CSB2002: IEEE COMPUTER SOCIETY BIOINFORMATICS CONFERENCE, 2002, :138-147
[5]   A vision for the future of genomics research [J].
Collins, FS ;
Green, ED ;
Guttmacher, AE ;
Guyer, MS .
NATURE, 2003, 422 (6934) :835-847
[6]   GLOBIN GENE SERVER - A PROTOTYPE E-MAIL DATABASE SERVER FEATURING EXTENSIVE MULTIPLE ALIGNMENTS AND DATA COMPILATION FOR ELECTRONIC GENETIC-ANALYSIS [J].
HARDISON, R ;
CHAO, KM ;
SCHWARTZ, S ;
STOJANOVIC, N ;
GANETSKY, M ;
MILLER, W .
GENOMICS, 1994, 21 (02) :344-353
[7]  
HEIN J, 1989, MOL BIOL EVOL, V6, P649
[8]   The human genome browser at UCSC [J].
Kent, WJ ;
Sugnet, CW ;
Furey, TS ;
Roskin, KM ;
Pringle, TH ;
Zahler, AM ;
Haussler, D .
GENOME RESEARCH, 2002, 12 (06) :996-1006
[9]   Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes [J].
Kent, WJ ;
Baertsch, R ;
Hinrichs, A ;
Miller, W ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (20) :11484-11489
[10]   Initial sequencing and analysis of the human genome [J].
Lander, ES ;
Int Human Genome Sequencing Consortium ;
Linton, LM ;
Birren, B ;
Nusbaum, C ;
Zody, MC ;
Baldwin, J ;
Devon, K ;
Dewar, K ;
Doyle, M ;
FitzHugh, W ;
Funke, R ;
Gage, D ;
Harris, K ;
Heaford, A ;
Howland, J ;
Kann, L ;
Lehoczky, J ;
LeVine, R ;
McEwan, P ;
McKernan, K ;
Meldrim, J ;
Mesirov, JP ;
Miranda, C ;
Morris, W ;
Naylor, J ;
Raymond, C ;
Rosetti, M ;
Santos, R ;
Sheridan, A ;
Sougnez, C ;
Stange-Thomann, N ;
Stojanovic, N ;
Subramanian, A ;
Wyman, D ;
Rogers, J ;
Sulston, J ;
Ainscough, R ;
Beck, S ;
Bentley, D ;
Burton, J ;
Clee, C ;
Carter, N ;
Coulson, A ;
Deadman, R ;
Deloukas, P ;
Dunham, A ;
Dunham, I ;
Durbin, R ;
French, L .
NATURE, 2001, 409 (6822) :860-921