首页    期刊浏览 2024年12月12日 星期四
登录注册

文章基本信息

  • 标题:Tertiary alphabet for the observable protein structural universe
  • 本地全文:下载
  • 作者:Craig O. Mackenzie ; Jianfu Zhou ; Gevorg Grigoryan
  • 期刊名称:Proceedings of the National Academy of Sciences
  • 印刷版ISSN:0027-8424
  • 电子版ISSN:1091-6490
  • 出版年度:2016
  • 卷号:113
  • 期号:47
  • 页码:E7438-E7447
  • DOI:10.1073/pnas.1607178113
  • 语种:English
  • 出版社:The National Academy of Sciences of the United States of America
  • 摘要:SignificanceProteins fold into intricate 3D structures, determined by their amino acid sequences. Different proteins can fold into drastically different structures, and the space of all possible structures appears hopelessly complex. However, this is precisely the space that needs to be described to understand how sequence encodes structure. In this paper, we decompose the set of known protein structures into standard reusable building blocks, which we call tertiary structural motifs (TERMs). Strikingly, we find that only [~]600 TERMs describe 50% of the known protein structural universe at sub-Angstrom resolution. Furthermore, we find the natural utilization of TERMs gives us a means of uncovering sequence-structure relationships. These insights can be harnessed for protein structure prediction, protein design, and other applications. Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only [~]600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence--a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure.
  • 关键词:tertiary motif ; structural degeneracy ; protein structural universe ; sequence–structure relationships ; structural modularity
国家哲学社会科学文献中心版权所有