Segmental duplications: Organization and impact within the current Human Genome Project assembly

被引:532
作者
Bailey, JA
Yavor, AM
Massa, HF
Trask, BJ
Eichler, EE [1 ]
机构
[1] Case Western Reserve Univ, Sch Med, Dept Genet, Cleveland, OH 44106 USA
[2] Case Western Reserve Univ, Sch Med, Ctr Human Genet, Cleveland, OH 44106 USA
[3] Univ Hosp Cleveland, Cleveland, OH 44106 USA
[4] Fred Hutchinson Canc Res Ctr, Seattle, WA 98109 USA
关键词
D O I
10.1101/gr.GR-1871R
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Segmental duplications play fundamental roles in both genomic disease and gene evolution. To understand their organization within the human genome, we have developed the computational tools and methods necessary to detect identity between long stretches of genomic sequence despite the presence of high copy repeats and large insertion-deletions. Here we present our analysis of the most recent genome assembly (January 2001) in which we focus on the global organization of these segments and the role they play in the whole-genome assembly process. initially, we considered only large recent duplication events that fell well-below levels of draft sequencing error (alignments 90%-98% similar and greater than or equal to1 kb in length). Duplications (90%-98%; greater than or equal to1 kb) comprise 3.6% of all human sequence. These duplications show clustering and up to 10-fold enrichment within pericentromeric and subtelomeric regions. In terms of assembly, duplicated sequences were found to be over-represented in unordered and unassigned contigs indicating that duplicated sequences are difficult to assign to their proper position. To assess coverage of these regions within the genome, we selected BACs containing interchromosomal duplications and characterized their duplication pattern by FISH. Only 47% (106/224) of chromosomes positive by FISH had a corresponding chromosomal position by BLAST comparison. We present data that indicate that this is attributable to misassembly, misassignment, and/or decreased sequencing coverage within duplicated regions. Surprisingly, if we consider putative duplications > 98% identity, we identify 10.6% (286 Mb) of the current assembly as paralogous. The majority of these alignments, we believe, represent unmerged overlaps within unique regions. Taken together the above data indicate that segmental duplications represent a significant impediment to accurate human genome assembly, requiring the development of specialized techniques to finish these exceptional regions of the genome. The identification and characterization of these highly duplicated regions represents an important step in the complete sequencing of a human reference genome.
引用
收藏
页码:1005 / 1017
页数:13
相关论文
共 40 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Localization of Chl1-related helicase genes to human chromosome regions 12p11 and 12p13: Similarity between parts of these genes and conserved human telomeric-associated DNA [J].
Amann, J ;
Valentine, W ;
Kidd, VJ ;
Lahti, JM .
GENOMICS, 1996, 32 (02) :260-265
[3]   Chromosome breakage in the Prader-Willi and Angelman syndromes involves recombination between large, transcribed repeats at proximal and distal breakpoints [J].
Amos-Landgraf, JM ;
Ji, YG ;
Gottlieb, W ;
Depinet, T ;
Wandstrat, AE ;
Cassidy, SB ;
Driscoll, DJ ;
Rogan, PK ;
Schwartz, S ;
Nicholls, RD .
AMERICAN JOURNAL OF HUMAN GENETICS, 1999, 65 (02) :370-386
[4]   The physical maps for sequencing human chromosomes 1, 6, 9, 10, 13, 20 and X [J].
Bentley, DR ;
Deloukas, P ;
Dunham, A ;
French, L ;
Gregory, SG ;
Humphray, SJ ;
Mungall, AJ ;
Ross, MT ;
Carter, NP ;
Dunham, I ;
Scott, CE ;
Ashcroft, KJ ;
Atkinson, AL ;
Aubin, K ;
Beare, DM ;
Bethel, G ;
Brady, N ;
Brook, JC ;
Burford, DC ;
Burrill, WD ;
Burrows, C ;
Butler, AP ;
Carder, C ;
Catanese, JJ ;
Clee, CM ;
Clegg, SM ;
Cobley, V ;
Coffey, AJ ;
Cole, CG ;
Collins, JE ;
Conquer, JS ;
Cooper, RA ;
Culley, KM ;
Dawson, E ;
Dearden, FL ;
Durbin, RM ;
de Jong, PJ ;
Dhami, PD ;
Earthrowl, ME ;
Edwards, CA ;
Evans, RS ;
Gillson, CJ ;
Ghori, J ;
Green, L ;
Gwilliam, R ;
Halls, KS ;
Hammond, S ;
Harper, GL ;
Heathcott, RW ;
Holden, JL .
NATURE, 2001, 409 (6822) :942-943
[5]   CHARACTERIZATION OF THE PUFFERFISH (FUGU) GENOME AS A COMPACT MODEL VERTEBRATE GENOME [J].
BRENNER, S ;
ELGAR, G ;
SANDFORD, R ;
MACRAE, A ;
VENKATESH, B ;
APARICIO, S .
NATURE, 1993, 366 (6452) :265-268
[6]   Homologous recombination of a flanking repeat gene cluster is a mechanism for a common contiguous gene deletion syndrome [J].
Chen, KS ;
Manian, P ;
Koeuth, T ;
Potocki, L ;
Zhao, Q ;
Chinault, AC ;
Lee, CC ;
Lupski, JR .
NATURE GENETICS, 1997, 17 (02) :154-163
[7]   Integration of cytogenetic landmarks into the draft sequence of the human genome [J].
Cheung, VG ;
Nowak, N ;
Jang, W ;
Kirsch, IR ;
Zhao, S ;
Chen, XN ;
Furey, TS ;
Kim, UJ ;
Kuo, WL ;
Olivier, M ;
Conroy, J ;
Kasprzyk, A ;
Massa, H ;
Yonescu, R ;
Sait, S ;
Thoreen, C ;
Snijders, A ;
Lemyre, E ;
Bailey, JA ;
Bruzel, A ;
Burrill, WD ;
Clegg, SM ;
Collins, S ;
Dhami, P ;
Friedman, C ;
Han, CS ;
Herrick, S ;
Lee, J ;
Ligon, AH ;
Lowry, S ;
Morley, M ;
Narasimhan, S ;
Osoegawa, K ;
Peng, Z ;
Plajzer-Frick, I ;
Quade, BJ ;
Scott, D ;
Sirotkin, K ;
Thorpe, AA ;
Gray, JW ;
Hudson, J ;
Pinkel, D ;
Ried, T ;
Rowen, L ;
Shen-Ong, GL ;
Strausberg, RL ;
Birney, E ;
Callen, DF ;
Cheng, JF ;
Cox, DR .
NATURE, 2001, 409 (6822) :953-958
[8]   Large genomic duplicons map to sites of instability in the Prader-Willi/Angelman syndrome chromosome region (15q11-q13) [J].
Christian, SL ;
Fantes, JA ;
Mewborn, SK ;
Huang, B ;
Ledbetter, DH .
HUMAN MOLECULAR GENETICS, 1999, 8 (06) :1025-1037
[9]   New goals for the US Human Genome Project: 1998-2003 [J].
Collins, FS ;
Patrinos, A ;
Jordan, E ;
Chakravarti, A ;
Gesteland, R ;
Walters, L ;
Fearon, E ;
Hartwelt, L ;
Langley, CH ;
Mathies, RA ;
Olson, M ;
Pawson, AJ ;
Pollard, T ;
Williamson, A ;
Wold, B ;
Buetow, K ;
Branscomb, E ;
Capecchi, M ;
Church, G ;
Garner, H ;
Gibbs, RA ;
Hawkins, T ;
Hodgson, K ;
Knotek, M ;
Meisler, M ;
Rubin, GM ;
Smith, LM ;
Smith, RF ;
Westerfield, M ;
Clayton, EW ;
Fisher, NL ;
Lerman, CE ;
McInerney, JD ;
Nebo, W ;
Press, N ;
Valle, D .
SCIENCE, 1998, 282 (5389) :682-689
[10]   The DNA sequence of human chromosome 22 [J].
Dunham, I ;
Shimizu, N ;
Roe, BA ;
Chissoe, S ;
Dunham, I ;
Hunt, AR ;
Collins, JE ;
Bruskiewich, R ;
Beare, DM ;
Clamp, M ;
Smink, LJ ;
Ainscough, R ;
Almeida, JP ;
Babbage, A ;
Bagguley, C ;
Balley, J ;
Barlow, K ;
Bates, KN ;
Beasley, O ;
Bird, CP ;
Blakey, S ;
Bridgeman, AM ;
Buck, D ;
Burgess, J ;
Burrill, WD ;
Burton, J ;
Carder, C ;
Carter, NP ;
Chen, Y ;
Clark, G ;
Clegg, SM ;
Cobley, V ;
Cole, CG ;
Collier, RE ;
Connor, RE ;
Conroy, D ;
Corby, N ;
Coville, GJ ;
Cox, AV ;
Davis, J ;
Dawson, E ;
Dhami, PD ;
Dockree, C ;
Dodsworth, SJ ;
Durbin, RM ;
Ellington, A ;
Evans, KL ;
Fey, JM ;
Fleming, K ;
French, L .
NATURE, 1999, 402 (6761) :489-495