SIMAP-the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage

被引:16
作者
Arnold, Roland [1 ]
Goldenberg, Florian [2 ]
Mewes, Hans-Werner [3 ]
Rattei, Thomas [2 ]
机构
[1] Univ Toronto, Terrence Donnelly Ctr Cellular & Biomol Res, Kim Lab, Toronto, ON M5S 3E1, Canada
[2] Univ Vienna, Dept Microbiol & Ecosyst Sci, CUBE Div Computat Syst Biol, A-1090 Vienna, Austria
[3] Tech Univ Munich, Helmholtz Zentrum Munchen, Inst Bioinformat & Syst Biol, D-85764 Neuherberg, Germany
关键词
VISUALIZATION; CONSTRUCTION; GENERATION; NETWORK; FAMILY; MATRIX; TOOL;
D O I
10.1093/nar/gkt970
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to similar to 70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith-Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads.
引用
收藏
页码:D279 / D284
页数:6
相关论文
共 34 条
[1]  
Acland A, 2013, NUCLEIC ACIDS RES, V41, pD8, DOI [10.1093/nar/gkx1095, 10.1093/nar/gks1189, 10.1093/nar/gkq1172]
[2]   OMA 2011: orthology inference among 1000 complete genomes [J].
Altenhoff, Adrian M. ;
Schneider, Adrian ;
Gonnet, Gaston H. ;
Dessimoz, Christophe .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D289-D294
[3]   Protein database searches using compositionally adjusted substitution matrices [J].
Altschul, SF ;
Wootton, JC ;
Gertz, EM ;
Agarwala, R ;
Morgulis, A ;
Schäffer, AA ;
Yu, YK .
FEBS JOURNAL, 2005, 272 (20) :5101-5109
[4]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[5]   SIMAP -: The similarity matrix of proteins [J].
Arnold, R ;
Rattei, T ;
Tischler, P ;
Truong, MD ;
Stümpflen, V ;
Mewes, W .
BIOINFORMATICS, 2005, 21 :42-46
[6]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[7]   Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies [J].
Atkinson, Holly J. ;
Morris, John H. ;
Ferrin, Thomas E. ;
Babbitt, Patricia C. .
PLOS ONE, 2009, 4 (02)
[8]   The RAST server: Rapid annotations using subsystems technology [J].
Aziz, Ramy K. ;
Bartels, Daniela ;
Best, Aaron A. ;
DeJongh, Matthew ;
Disz, Terrence ;
Edwards, Robert A. ;
Formsma, Kevin ;
Gerdes, Svetlana ;
Glass, Elizabeth M. ;
Kubal, Michael ;
Meyer, Folker ;
Olsen, Gary J. ;
Olson, Robert ;
Osterman, Andrei L. ;
Overbeek, Ross A. ;
McNeil, Leslie K. ;
Paarmann, Daniel ;
Paczian, Tobias ;
Parrello, Bruce ;
Pusch, Gordon D. ;
Reich, Claudia ;
Stevens, Rick ;
Vassieva, Olga ;
Vonstein, Veronika ;
Wilke, Andreas ;
Zagnitko, Olga .
BMC GENOMICS, 2008, 9 (1)
[9]   Pythoscape: a framework for generation of large protein similarity networks [J].
Barber, Alan E., II ;
Babbitt, Patricia C. .
BIOINFORMATICS, 2012, 28 (21) :2845-2846
[10]   Blast2GO:: a universal tool for annotation, visualization and analysis in functional genomics research [J].
Conesa, A ;
Götz, S ;
García-Gómez, JM ;
Terol, J ;
Talón, M ;
Robles, M .
BIOINFORMATICS, 2005, 21 (18) :3674-3676