PairsDB atlas of protein sequence space

被引:11
作者
Heger, Andreas [1 ,2 ]
Korpelainen, Eija [3 ]
Hupponen, Taavi [3 ]
Mattila, Kimmo [3 ]
Ollikainen, Vesa [3 ]
Holm, Liisa [1 ,4 ]
机构
[1] Univ Helsinki, Inst Biotechnol, FIN-00014 Helsinki, Finland
[2] Univ Oxford, MRC, Funct Genet Unit, Oxford OX1 2JD, England
[3] CSC, Espoo, Finland
[4] Univ Helsinki, Dept Biol & Environm Sci, Div Genet, FIN-00014 Helsinki, Finland
基金
英国医学研究理事会;
关键词
D O I
10.1093/nar/gkm879
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Sequence similarity/database searching is a cornerstone of molecular biology. PairsDB is a database intended to make exploring protein sequences and their similarity relationships quick and easy. Behind PairsDB is a comprehensive collection of protein sequences and BLAST and PSI-BLAST alignments between them. Instead of running BLAST or PSI-BLAST individually on each request, results are retrieved instantaneously from a database of pre-computed alignments. Filtering options allow you to find a set of sequences satisfying a set of criteria - for example, all human proteins with solved structure and without transmembrane segments. PairsDB is continually updated and covers all sequences in Uniprot. The data is stored in a MySQL relational database. Data files will be made available for download at ftp://nic.funet.fi/pub/sci/molbio. PairsDB can also be accessed interactively at http://pairsdb.csc.fi. PairsDB data is a valuable platform to build various downstream automated analysis pipelines. For example, the graph of all-against-all similarity relationships is the starting point for clustering protein families, delineating domains, improving alignment accuracy by consistency measures, and defining orthologous genes. Moreover, query-anchored stacked sequence alignments, profiles and consensus sequences are useful in studies of sequence conservation patterns for clues about possible functional sites.
引用
收藏
页码:D276 / D280
页数:5
相关论文
共 25 条
[1]
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]
SCOP database in 2004: refinements integrate structure and sequence family data [J].
Andreeva, A ;
Howorth, D ;
Brenner, SE ;
Hubbard, TJP ;
Chothia, C ;
Murzin, AG .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D226-D229
[3]
ProbCons: Probabilistic consistency-based multiple sequence alignment [J].
Do, CB ;
Mahabhashyam, MSP ;
Brudno, M ;
Batzoglou, S .
GENOME RESEARCH, 2005, 15 (02) :330-340
[4]
Pfam:: clans, web tools and services [J].
Finn, Robert D. ;
Mistry, Jaina ;
Schuster-Bockler, Benjamin ;
Griffiths-Jones, Sam ;
Hollich, Volker ;
Lassmann, Timo ;
Moxon, Simon ;
Marshall, Mhairi ;
Khanna, Ajay ;
Durbin, Richard ;
Eddy, Sean R. ;
Sonnhammer, Erik L. L. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D247-D251
[5]
The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution [J].
Greene, Lesley H. ;
Lewis, Tony E. ;
Addou, Sarah ;
Cuff, Alison ;
Dallman, Tim ;
Dibley, Mark ;
Redfern, Oliver ;
Pearl, Frances ;
Nambudiry, Rekha ;
Reid, Adam ;
Sillitoe, Ian ;
Yeats, Corin ;
Thornton, Janet M. ;
Orengo, Christine A. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D291-D297
[6]
Towards a covering set of protein family profiles [J].
Heger, A ;
Holm, L .
PROGRESS IN BIOPHYSICS & MOLECULAR BIOLOGY, 2000, 73 (05) :321-337
[7]
Heger A, 2005, NUCLEIC ACIDS RES, V33, pD188
[8]
Accurate detection of very sparse sequence motifs [J].
Heger, A ;
Lappe, M ;
Holm, L .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2004, 11 (05) :843-857
[9]
Exhaustive enumeration of protein domain families [J].
Heger, A ;
Holm, L .
JOURNAL OF MOLECULAR BIOLOGY, 2003, 328 (03) :749-767
[10]
The global trace graph, a novel paradigm for searching protein sequence databases [J].
Heger, Andreas ;
Mallick, Swapan ;
Wilton, Christopher ;
Holm, Liisa .
BIOINFORMATICS, 2007, 23 (18) :2361-2367