Fast and accurate short read alignment with Burrows-Wheeler transform

被引：17486

作者：

Li, Heng ^{[1
]}

Durbin, Richard ^{[1
]}

机构：

[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England

来源：

BIOINFORMATICS | 2009年 / 25卷 / 14期

基金：

英国惠康基金;

关键词：

GENOME; OLIGONUCLEOTIDES; SEQUENCES; PROGRAM; SPACE; DNA;

D O I：

10.1093/bioinformatics/btp324

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows-Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e. g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is similar to 10-20 x faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package.

引用

页码：1754 / 1760

页数：7

共 17 条

[1]

Burrows M, 1994, BLOCK SORTING LOSSLE

[2] PASS: a program to align short sequences [J].

Campagna, Davide ;

Albiero, Alessandro ;

Bilardi, Alessandra ;

Caniato, Elisa ;

Forcato, Claudio ;

Manavski, Svetlin ;

Vitulo, Nicola ;

Valle, Giorgio .

BIOINFORMATICS, 2009, 25 (07) :967-968

[3] MOM: maximum oligonucleotide mapping [J].

Eaves, Hugh L. ;

Gao, Yuan .

BIOINFORMATICS, 2009, 25 (07) :969-970

[4] Opportunistic data structures with applications [J].

Ferragina, P ;

Manzini, G .

41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, :390-398

[5]

Grossi R., 2000, Proceedings of the Thirty Second Annual ACM Symposium on Theory of Computing, P397, DOI 10.1145/335305.335351

[6] A space and time efficient algorithm for constructing compressed suffix arrays [J].

Hon, Wing-Kai ;

Lam, Tak-Wah ;

Sadakane, Kunihiko ;

Sung, Wing-Kin ;

Yiu, Siu-Ming .

ALGORITHMICA, 2007, 48 (01) :23-36

[7] SeqMap: mapping massive amount of oligonucleotides to the genome [J].

Jiang, Hui ;

Wong, Wing Hung .

BIOINFORMATICS, 2008, 24 (20) :2395-2396

[8] ProbeMatch: rapid alignment of oligonucleotides to genome allowing both gaps and mismatches [J].

Kim, You Jung ;

Teletia, Nikhil ;

Ruotti, Victor ;

Maher, Christopher A. ;

Chinnaiyan, Arul M. ;

Stewart, Ron ;

Thomson, James A. ;

Patel, Jignesh M. .

BIOINFORMATICS, 2009, 25 (11) :1424-1425

[9] Compressed indexing and local alignment of DNA [J].

Lam, T. W. ;

Sung, W. K. ;

Tam, S. L. ;

Wong, C. K. ;

Yiu, S. M. .

BIOINFORMATICS, 2008, 24 (06) :791-797

[10] Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].

Langmead, Ben ;

Trapnell, Cole ;

Pop, Mihai ;

Salzberg, Steven L. .

GENOME BIOLOGY, 2009, 10 (03)

← 1 2 →