Peptide Sequence Tag Extraction by Graph Convolution Neural Networks

Authors

  • XinYe Bian School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
  • DongMei Xie School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
  • DI Zhang School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
  • XiaoYu Xie School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
  • Yuyue Feng School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
  • Piyu Zhou Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
  • Changjiu He School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
  • Mingming Lv School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China
  • Haipeng Wang School of Computer Science and Technology, Shandong University of Technology, Zibo 255000, China

DOI:

https://doi.org/10.12694/scpe.v26i1.3722

Keywords:

proteomics, peptide sequence tag, graph convolutional neural network, de novo sequencing, tandem mass spectrometry

Abstract

The peptide sequence tag extraction method plays a vital role in tandem mass spectrometry-based protein identification engines. This approach faces two significant challenges in practical applications: first, the issue of fixed tag lengths, where shorter tags lack sufficient specificity, leading to an excessive recall of non-target peptide sequences, and longer tags experience a reduction in precision as tag length increases, potentially failing to recall target peptide sequences; second, the sensitivity and precision of tag extraction remain relatively low. To address these issues, a variable-length peptide sequence tag extraction algorithm, TagEx, based on graph convolutional networks, is proposed. This method begins by training a de novo peptide sequencing scoring model utilizing graph convolutional networks. It then constructs a spectral peak connection graph from the mass spectrum, employing a depth-first search strategy to extract variable-length peptide sequence tags, with the trained graph convolutional network model scoring amino acid connections during the extraction process.Finally, tags are filtered based on length and scoring to obtain the final candidate peptide sequence tag set. To evaluate TagEx’s performance, it was benchmarked against three representative tag extraction software tools: InsPect, PepNovo+, and DirecTag. The experimental results demonstrate that TagEx exhibits superior sensitivity, coverage, and precision, with improvements of 0.62-2.32, 3.22-11.14, and 3.29-8.31 percentage points, respectively, when retaining the top 100 tags.

Downloads

Published

2025-01-05

Issue

Section

Special Issue - Efficient Scalable Computing based on IoT and Cloud Computing