The variant call format and VCFtools

被引:15903
作者
Danecek, Petr [1 ]
Auton, Adam [2 ]
Abecasis, Goncalo [3 ]
Albers, Cornelis A. [1 ]
Banks, Eric [4 ]
DePristo, Mark A. [4 ]
Handsaker, Robert E. [4 ]
Lunter, Gerton [2 ]
Marth, Gabor T. [5 ]
Sherry, Stephen T. [6 ]
McVean, Gilean [2 ,7 ]
Durbin, Richard [1 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[2] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford OX3 7BN, England
[3] Univ Michigan, Dept Biostat, Ctr Stat Genet, Ann Arbor, MI 48109 USA
[4] Broad Inst MIT & Harvard, Program Med & Populat Genet, Cambridge, MA 02141 USA
[5] Boston Coll, Dept Biol, Boston, MA 02467 USA
[6] Natl Lib Med, Natl Ctr Biotechnol Informat, NIH, Bethesda, MD 20894 USA
[7] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
基金
美国国家卫生研究院; 英国惠康基金; 英国医学研究理事会;
关键词
D O I
10.1093/bioinformatics/btr330
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API.
引用
收藏
页码:2156 / 2158
页数:3
相关论文
共 4 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[3]   The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data [J].
McKenna, Aaron ;
Hanna, Matthew ;
Banks, Eric ;
Sivachenko, Andrey ;
Cibulskis, Kristian ;
Kernytsky, Andrew ;
Garimella, Kiran ;
Altshuler, David ;
Gabriel, Stacey ;
Daly, Mark ;
DePristo, Mark A. .
GENOME RESEARCH, 2010, 20 (09) :1297-1303
[4]   A standard variation file format for human genome sequences [J].
Reese, Martin G. ;
Moore, Barry ;
Batchelor, Colin ;
Salas, Fidel ;
Cunningham, Fiona ;
Marth, Gabor T. ;
Stein, Lincoln ;
Flicek, Paul ;
Yandell, Mark ;
Eilbeck, Karen .
GENOME BIOLOGY, 2010, 11 (08) :R88