Article Highlight | 19-Sep-2024

sORFPred: A method based on comprehensive features and ensemble learning to predict the sORFs in plant LncRNAs

Shanghai Jiao Tong University Journal Center

Long non-coding RNAs (lncRNAs) are important regulators of biological processes. It has recently been shown that some lncRNAs include small open reading frames (sORFs) that can encode small peptides of no more than 100 amino acids. However, existing methods are commonly applied to human and animal datasets and still suffer from low feature representation capability. Thus, accurate and credible prediction of sORFs with coding ability in plant lncRNAs is imperative.

The team led by Jun Meng from Dalian University of Technology proposed a new method termed sORFPred, in which they design a model named MCSEN by combining multi-scale convolution and Squeeze-and-Excitation Networks to fully mine distinct information embedded in sORFs, integrate and optimize multiple sequence-based and physicochemical feature descriptors, and built a two-layer prediction classifier based on Bayesian optimization algorithm and Extra Trees. sORFPred has been evaluated on sORFs datasets of three species and experimentally validated sORFs dataset. Results indicate that sORFPred outperforms existing methods and achieves 97.28% accuracy, 97.06% precision, 97.52% recall, and 97.29% F1-score on Arabidopsis thaliana, which shows a significant improvement in prediction performance compared to various conventional shallow machine learning and deep learning models.

According to our best knowledge, this research is the first to predict sORFs with coding potential in plant lncRNAs using such comprehensive and detailed features and an ensemble learning model based on the Bayesian optimization method. In comparison to existing methods, it achieves greater performance and generalization capability. sORFPred is expected to become a potent method for the large-scale prediction of sORFs. The prediction of sORFs with coding ability in plant lncRNAs will not only lay the foundation for the discovery of lncRNA-encoded small peptides, but also provide an important reference for biological experimental validation, which is conducive to revealing the molecular mechanisms of life-form traits and disease resistance, and is of great value in agriculture and forestry production and other fields.

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.