image: (a) Data acquisition; (b) a pan-genomics analysis and two-step feature-selection process; (c) SARPLLM; (d) the QSMOTEN algorithm; (e) the Salmonella resistance predictive platform.
Credit: Yujie You et al.
A recent study published in Engineering presents a novel approach to predict Salmonella antimicrobial resistance, a growing concern for public health. The research, led by Le Zhang from Sichuan University, combines large language models (LLMs) and quantum computing to develop a predictive platform.
Salmonella is a common foodborne pathogen. The overuse of antimicrobials and genetic mutations have led to the rise of antimicrobial-resistant Salmonella strains, making it crucial to predict resistance accurately for effective treatment. However, traditional methods like bacterial antimicrobial susceptibility tests (ASTs) are inefficient, and current predictive models using whole-genome sequencing (WGS) data suffer from overfitting due to high dimensionality.
To address these issues, the researchers proposed a two-step feature-selection process. First, they used a chi-square test and conditional mutual information maximization to screen for key Salmonella resistance genes in pan-genomics analysis. Then, they developed an LLM-based Salmonella antimicrobial-resistance predictive (SARPLLM) algorithm, which is based on the Qwen2 LLM and low-rank adaptation (LoRA). This algorithm converts Salmonella samples into sentences for the LLM to process, enabling accurate resistance prediction.
Another challenge is the imbalance between the number of antimicrobial-resistant and sensitive samples in Salmonella WGS data. To solve this, the team developed the QSMOTEN algorithm. Based on the SMOTEN algorithm, QSMOTEN uses quantum computing to encode feature names and values into quantum states and compute the distance between samples. This reduces the time complexity of distance computation from a linear to a logarithmic level, making it more efficient for high-dimensional WGS data.
The researchers also built a user-friendly Salmonella antimicrobial-resistance predictive online platform. The platform, which uses Django as the back-end service architecture and Echarts for knowledge graph visualization, has four modules: an antimicrobial-resistance predictive module, a pan-genomics analysis results module, a gene-sample-antimicrobial knowledge-graph module, and a data download module. It allows users to upload gene feature files for online resistance prediction and provides data visualization and download functions.
Experimental results show that the SARPLLM algorithm outperforms other models in antimicrobial-resistance prediction, with high F1-scores for multiple antimicrobials. The QSMOTEN algorithm can accurately compute the similarity between samples, both on virtual and physical quantum machines, demonstrating the potential of quantum computing in accelerating data augmentation.
While the study represents a significant step forward, the researchers acknowledge that there are still limitations. The complex biological and genetic knowledge involved in Salmonella antimicrobial-resistance prediction is difficult for current LLMs to fully understand, and the performance of LLMs depends on the quality and quantity of training data. Quantum computing technology is also in its early stages. Future research will focus on integrating multi-source data and domain knowledge to improve the accuracy of the predictive platform and developing more stable quantum hardware.
The paper “Developing a Predictive Platform for Salmonella Antimicrobial Resistance Based on a Large Language Model and Quantum Computing,” authored by Yujie You, Kan Tan, Zekun Jiang, Le Zhang. Full text of the open access paper: https://doi.org/10.1016/j.eng.2025.01.013. For more information about the Engineering, follow us on X (https://twitter.com/EngineeringJrnl) & like us on Facebook (https://www.facebook.com/EngineeringJrnl).
Journal
Engineering
Article Title
Developing a Predictive Platform for Salmonella Antimicrobial Resistance Based on a Large Language Model and Quantum Computing
Article Publication Date
28-Jan-2025