SIMAP-the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage

被引:16
作者
Arnold, Roland [1 ]
Goldenberg, Florian [2 ]
Mewes, Hans-Werner [3 ]
Rattei, Thomas [2 ]
机构
[1] Univ Toronto, Terrence Donnelly Ctr Cellular & Biomol Res, Kim Lab, Toronto, ON M5S 3E1, Canada
[2] Univ Vienna, Dept Microbiol & Ecosyst Sci, CUBE Div Computat Syst Biol, A-1090 Vienna, Austria
[3] Tech Univ Munich, Helmholtz Zentrum Munchen, Inst Bioinformat & Syst Biol, D-85764 Neuherberg, Germany
关键词
VISUALIZATION; CONSTRUCTION; GENERATION; NETWORK; FAMILY; MATRIX; TOOL;
D O I
10.1093/nar/gkt970
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
070307 [化学生物学]; 071010 [生物化学与分子生物学];
摘要
The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to similar to 70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith-Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads.
引用
收藏
页码:D279 / D284
页数:6
相关论文
共 34 条
[31]
The COG database: an updated version includes eukaryotes [J].
Tatusov, RL ;
Fedorova, ND ;
Jackson, JD ;
Jacobs, AR ;
Kiryutin, B ;
Koonin, EV ;
Krylov, DM ;
Mazumder, R ;
Mekhedov, SL ;
Nikolskaya, AN ;
Rao, BS ;
Smirnov, S ;
Sverdlov, AV ;
Vasudevan, S ;
Wolf, YI ;
Yin, JJ ;
Natale, DA .
BMC BIOINFORMATICS, 2003, 4 (1)
[32]
TERRAPON N, 2013, BIOINFORMATICS
[33]
PEDANT covers all complete RefSeq genomes [J].
Walter, Mathias C. ;
Rattei, Thomas ;
Arnold, Roland ;
Gueldener, Ulrich ;
Muensterkoetter, Martin ;
Nenova, Karamfilka ;
Kastenmueller, Gabi ;
Tischler, Patrick ;
Woelling, Andreas ;
Volz, Andreas ;
Pongratz, Norbert ;
Jost, Ralf ;
Mewes, Hans-Werner ;
Frishman, Dmitrij .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D408-D411
[34]
The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions [J].
Yu, YK ;
Altschul, SF .
BIOINFORMATICS, 2005, 21 (07) :902-911