News Release

Algorithms designed to study language predict virus 'escape' mutations for SARS-CoV-2 and others

Peer-Reviewed Publication

American Association for the Advancement of Science (AAAS)

By bridging the conceptual divide between human language and viral evolution, researchers have developed a powerful new tool for predicting the mutations that allow viruses to "escape" human immunity or vaccines. Its use would avoid the need for high-throughput experimental techniques currently employed to identify potential mutations that could allow a virus to escape recognition. "The authors have uncovered a parallel between the properties of a virus and its interpretation by the host immune system and the properties of sentences in natural language and its interpretation by a human," write Yoo-Ah Kim and Teresa Przytycka in a related Perspective. Occasionally, viruses mutate in ways that allow them to evade the human immune system and cause infection, also known as viral escape. This ability of viruses represents a major challenge in vaccine and antiviral development, particularly in the creation of a universal flu vaccine and effective therapies for HIV. What's more, viral escape has quickly become a pressing concern in the race to develop solutions for SARS-CoV-2 infection. While understanding the rules that govern the evolution of escape mutations could inform therapeutic design, current techniques for identifying potential escape mutations are limited. Inspired by the linguistic concepts of grammar (or syntax) and meaning (or semantics), Brian Hie and colleagues applied natural language processing - a machine learning technique originally developed to train computers to understand human language using a sequence of words - to predict the mutations that may lead to viral escape using sequences of amino acids. Similar to how word changes can preserve a sentence's grammar but alter its meaning, Hie et al. show how escape can be achieved by mutations that preserve the biological "syntax" that governs viral infectivity yet alter a virus' "semantics" so it is no longer recognized by neutralizing antibodies. According to the results, separate language models developed for influenza A, HIV-1 and SARS-CoV-2 proteins accurately predicted causal escape mutations and determined structural regions with high escape potential. The models achieved these results without previous training and using raw sequence data alone. They find that for SARS-CoV-2, the escape potential within the Spike protein (by which the virus infects a cell) is significantly enriched in two domains and depleted in another.

###


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.