PairsDB atlas of protein sequence space

被引:11
作者
Heger, Andreas [1 ,2 ]
Korpelainen, Eija [3 ]
Hupponen, Taavi [3 ]
Mattila, Kimmo [3 ]
Ollikainen, Vesa [3 ]
Holm, Liisa [1 ,4 ]
机构
[1] Univ Helsinki, Inst Biotechnol, FIN-00014 Helsinki, Finland
[2] Univ Oxford, MRC, Funct Genet Unit, Oxford OX1 2JD, England
[3] CSC, Espoo, Finland
[4] Univ Helsinki, Dept Biol & Environm Sci, Div Genet, FIN-00014 Helsinki, Finland
基金
英国医学研究理事会;
关键词
D O I
10.1093/nar/gkm879
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Sequence similarity/database searching is a cornerstone of molecular biology. PairsDB is a database intended to make exploring protein sequences and their similarity relationships quick and easy. Behind PairsDB is a comprehensive collection of protein sequences and BLAST and PSI-BLAST alignments between them. Instead of running BLAST or PSI-BLAST individually on each request, results are retrieved instantaneously from a database of pre-computed alignments. Filtering options allow you to find a set of sequences satisfying a set of criteria - for example, all human proteins with solved structure and without transmembrane segments. PairsDB is continually updated and covers all sequences in Uniprot. The data is stored in a MySQL relational database. Data files will be made available for download at ftp://nic.funet.fi/pub/sci/molbio. PairsDB can also be accessed interactively at http://pairsdb.csc.fi. PairsDB data is a valuable platform to build various downstream automated analysis pipelines. For example, the graph of all-against-all similarity relationships is the starting point for clustering protein families, delineating domains, improving alignment accuracy by consistency measures, and defining orthologous genes. Moreover, query-anchored stacked sequence alignments, profiles and consensus sequences are useful in studies of sequence conservation patterns for clues about possible functional sites.
引用
收藏
页码:D276 / D280
页数:5
相关论文
共 25 条
[21]
COFFEE: An objective function for multiple sequence alignments [J].
Notredame, C ;
Holm, L ;
Higgins, DG .
BIOINFORMATICS, 1998, 14 (05) :407-422
[22]
RSDB: representative protein sequence databases have high information content [J].
Park, J ;
Holm, L ;
Heger, A ;
Chothia, C .
BIOINFORMATICS, 2000, 16 (05) :458-464
[23]
CAST: an iterative algorithm for the complexity analysis of sequence tracts [J].
Promponas, VJ ;
Enright, AJ ;
Tsoka, S ;
Kreil, DP ;
Leroy, C ;
Hamodrakas, S ;
Sander, C ;
Ouzounis, CA .
BIOINFORMATICS, 2000, 16 (10) :915-922
[24]
SANDER C, 1994, NUCLEIC ACIDS RES, V22, P3597
[25]
Detecting putative orthologs [J].
Wall, DP ;
Fraser, HB ;
Hirsh, AE .
BIOINFORMATICS, 2003, 19 (13) :1710-1711