首页    期刊浏览 2025年01月19日 星期日
登录注册

文章基本信息

  • 标题:Extended many-item similarity indices for sets of nucleotide and protein sequences
  • 本地全文:下载
  • 作者:Dávid Bajusz ; Ramón Alain Miranda-Quintana ; Anita Rácz
  • 期刊名称:Computational and Structural Biotechnology Journal
  • 印刷版ISSN:2001-0370
  • 出版年度:2021
  • 卷号:19
  • 页码:3628-3639
  • DOI:10.1016/j.csbj.2021.06.021
  • 出版社:Computational and Structural Biotechnology Journal
  • 摘要:Quantification of similarities between protein sequences or DNA/RNA strands is a (sub-)task that is ubiquitously present in bioinformatics workflows, and is usually accomplished by pairwise comparisons of sequences, utilizing simple ( e.g. percent identity) or more intricate concepts ( e.g. substitution scoring matrices). Complex tasks (such as clustering) rely on a large number of pairwise comparisons under the hood, instead of a direct quantification of set similarities. Based on our recently introduced framework that enables multiple comparisons of binary molecular fingerprints ( i.e. , direct calculation of the similarity of fingerprint sets), here we introduce novel symmetric similarity indices for analogous calculations on sets of character sequences with more than two ( t ) possible items ( e.g. DNA/RNA sequences with t = 4, or protein sequences with t = 20). The features of these new indices are studied in detail with analysis of variance (ANOVA), and demonstrated with three case studies of protein/DNA sequences with varying degrees of similarity (or evolutionary proximity). The Python code for the extended many-item similarity indices is publicly available at: https://github.com/ramirandaq/tn_Comparisons .
  • 关键词:Multiple comparisons ; DNA sequences ; Protein sequences ; Diversity analysis ; Similarity indices ; Consistency ; ANOVA ; Human protein kinases ; Human SH2 domains ; Cytochrome P450
国家哲学社会科学文献中心版权所有