Highly accurate protein structure prediction for the human proteome

被引:2009
作者
Tunyasuvunakool, Kathryn [1 ,2 ]
Adler, Jonas [1 ]
Wu, Zachary [1 ]
Green, Tim [1 ]
Zielinski, Michal [1 ]
Zidek, Augustin [1 ]
Bridgland, Alex [1 ]
Cowie, Andrew [1 ]
Meyer, Clemens [1 ]
Laydon, Agata [1 ]
Velankar, Sameer [2 ]
Kleywegt, Gerard J. [2 ]
Bateman, Alex [2 ]
Evans, Richard [1 ]
Pritzel, Alexander [1 ]
Figurnov, Michael [1 ]
Ronneberger, Olaf [1 ]
Bates, Russ [1 ]
Kohl, Simon A. A. [1 ]
Potapenko, Anna [1 ]
Ballard, Andrew J. [1 ]
Romera-Paredes, Bernardino [1 ]
Nikolov, Stanislav [1 ]
Jain, Rishub [1 ]
Clancy, Ellen [1 ]
Reiman, David [1 ]
Petersen, Stig [1 ]
Senior, Andrew W. [1 ]
Kavukcuoglu, Koray [1 ]
Birney, Ewan [2 ]
Kohli, Pushmeet [1 ]
Jumper, John [1 ,2 ]
Hassabis, Demis [1 ,2 ]
机构
[1] DeepMind, London, England
[2] European Bioinformat Inst, European Mol Biol Lab, Hinxton, England
关键词
INTRINSIC DISORDER; SCORING FUNCTION; OPTIMIZATION; DISCOVERY; TOPOLOGY; DATABASE; MODELS;
D O I
10.1038/s41586-021-03828-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure(1). Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold(2), at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.
引用
收藏
页码:590 / +
页数:19
相关论文
共 75 条
[1]   Targeting diacylglycerol acyltransferase 2 for the treatment of nonalcoholic steatohepatitis [J].
Amin, Neeta B. ;
Carvajal-Gonzalez, Santos ;
Purkal, Julie ;
Zhu, Tong ;
Crowley, Collin ;
Perez, Sylvie ;
Chidsey, Kristin ;
Kim, Albert M. ;
Goodwin, Bryan .
SCIENCE TRANSLATIONAL MEDICINE, 2019, 11 (520)
[2]   The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures [J].
Andreeva, Antonina ;
Kulesha, Eugene ;
Gough, Julian ;
Murzin, Alexey G. .
NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) :D376-D382
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   Why do eukaryotic proteins contain more intrinsically disordered regions? [J].
Basile, Walter ;
Salvatore, Marco ;
Bassot, Claudio ;
Elofsson, Arne .
PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (07)
[5]   UniProt: the universal protein knowledgebase in 2021 [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Agivetova, Rahat ;
Ahmad, Shadab ;
Alpi, Emanuele ;
Bowler-Barnett, Emily H. ;
Britto, Ramona ;
Bursteinas, Borisas ;
Bye-A-Jee, Hema ;
Coetzee, Ray ;
Cukura, Austra ;
Da Silva, Alan ;
Denny, Paul ;
Dogan, Tunca ;
Ebenezer, ThankGod ;
Fan, Jun ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Hussein, Abdulrahman ;
Ignatchenko, Alexandr ;
Insana, Giuseppe ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lock, Antonia ;
Lopez, Rodrigo ;
Luciani, Aurelien ;
Luo, Jie ;
Lussi, Yvonne ;
Mac-Dougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Mishra, Alok ;
Moulang, Katie ;
Nightingale, Andrew ;
Oliveira, Carla Susana ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Rice, Daniel ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sampson, Joseph .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D480-D489
[6]   Improving homology modeling from low-sequence identity templates in Rosetta: A case study in GPCRs [J].
Bender, Brian Joseph ;
Marlow, Brennica ;
Meiler, Jens .
PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (10)
[7]   Finding Our Way in the Dark Proteome [J].
Bhowmick, Asmit ;
Brookes, David H. ;
Yost, Shane R. ;
Dyson, H. Jane ;
Forman-Kay, Julie D. ;
Gunter, Daniel ;
Head-Gordon, Martin ;
Hura, Gregory L. ;
Pande, Vijay S. ;
Wemmer, David E. ;
Wright, Peter E. ;
Head-Gordon, Teresa .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2016, 138 (31) :9730-9742
[8]   Discovery of a Potent, Selective, and Orally Efficacious Pyrimidinooxazinyl Bicyclooctaneacetic Acid Diacylglycerol Acyltransferase-1 Inhibitor [J].
Birch, Alan M. ;
Birtles, Susan ;
Buckett, Linda K. ;
Kemmitt, Paul D. ;
Smith, Graham J. ;
Smith, Tim J. D. ;
Turnbull, Andrew V. ;
Wang, Steven J. Y. .
JOURNAL OF MEDICINAL CHEMISTRY, 2009, 52 (06) :1558-1568
[9]   Protein Data Bank: the single global archive for 3D macromolecular structure data [J].
Burley, Stephen K. ;
Berman, Helen M. ;
Bhikadiya, Charmi ;
Bi, Chunxiao ;
Chen, Li ;
Di Costanzo, Luigi ;
Christie, Cole ;
Duarte, Jose M. ;
Dutta, Shuchismita ;
Feng, Zukang ;
Ghosh, Sutapa ;
Goodsell, David S. ;
Green, Rachel Kramer ;
Guranovic, Vladimir ;
Guzenko, Dmytro ;
Hudson, Brian P. ;
Liang, Yuhe ;
Lowe, Robert ;
Peisach, Ezra ;
Periskova, Irina ;
Randle, Chris ;
Rose, Alexander ;
Sekharan, Monica ;
Shao, Chenghua ;
Tao, Yi-Ping ;
Valasatava, Yana ;
Voigt, Maria ;
Westbrook, John ;
Young, Jasmine ;
Zardecki, Christine ;
Zhuravleva, Marina ;
Kurisu, Genji ;
Nakamura, Haruki ;
Kengaku, Yumiko ;
Cho, Hasumi ;
Sato, Junko ;
Kim, Ju Yaen ;
Ikegawa, Yasuyo ;
Nakagawa, Atsushi ;
Yamashita, Reiko ;
Kudou, Takahiro ;
Bekker, Gert-Jan ;
Suzuki, Hirofumi ;
Iwata, Takeshi ;
Yokochi, Masashi ;
Kobayashi, Naohiro ;
Fujiwara, Toshimichi ;
Velankar, Sameer ;
Kleywegt, Gerard J. ;
Anyango, Stephen .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D520-D528
[10]   Structure-function analysis of diacylglycerol acyltransferase sequences from 70 organisms [J].
Cao H. .
BMC Research Notes, 4 (1)