News Release

Breakthrough Research shows potential for generative AI to accelerate development of new antivirals and drug discovery

This breakthrough has the potential to get drugs to people faster in the next crisis and bring treatments for urgent, life-threatening illnesses within reach

Peer-Reviewed Publication

Diamond Light Source

Surface representation of SARS CoV-2 Mpro protein

image: Surface representation of SARS CoV-2 Mpro protein with fragment hits from XChem platform bound in active site (Green) view more 

Credit: Copyright of Diamond Light Source Ltd

In a new study, researchers from IBM, Oxford University and Diamond Light Source show that IBM’s AI Model,  MoLFormer, can generate antiviral molecules for multiple target virus proteins, including SARS-CoV-2, that can accelerate the drug discovery process..  The results, are laid out in a new paper in Science Advances,  and at the time of the paper’s submission, the antiviral properties of eleven molecules were successfully validated by Oxford researchers. This breakthrough has the potential to get drugs to people faster in the next crisis and bring treatments for urgent, life-threatening illnesses within reach  

Early in the pandemic, a group of computer scientists at IBM wanted to explore if generative AI could be used to design never-before seen molecules to block SARS-CoV-2, the virus that causes Covid-19. David Stuart, Head of the Division of Structural Biology in the Department of Clinical Medicine at the University of Oxford and Life Sciences Director at Diamond Light Source, the UK’s national synchrotron who is an authority on pathogens HIV, SARS, and Ebola, among other viruses explains he was initially sceptical;  “The idea that you could take a protein sequence and, with AI, pluck out of thin air chemicals that would bind to a 3D site on the virus seemed very unlikely.”  

However, he and Martin Walsh also an expert structural biologist and Life Sciences Deputy director at Diamond joined up with the IBM team and over the course of three years, demonstrated that generative AI could, ‘pluck viable starting points for antivirals out of thin air’, in collaboration with Enamine Ltd., a chemical supplier in Ukraine, and other researchers at Oxford.  

Because the generative model was also a foundation model, pre-trained on massive amounts of raw data, it was versatile enough to create new inhibitors for multiple protein targets without extra training or any knowledge of its 3D structure.   

The Stuart and Walsh groups had commenced working on two essential SARS-CoV-2 proteins, namely the spike protein and the main protease. Using these targets, the team hit on four potential Covid-19 antivirals in a fraction of the time it would have taken using conventional methods. The work then exploited Diamond’s high-throughput macromolecular crystallography beamlines to visualise how a subset of the AI generated compounds bound to the main protease. Their work is showcased in theirnew paper  in Science Advances and IBM has released a web-based interface for interacting with the model and chemical foundation models like it in IBM Cloud. 

The team stated that the validated molecules have many more hurdles to clear, including clinical trials, before companies could potentially turn them into drugs. But even if the AI-generated “hits” never materialise into actual drugs, the work provides confirmation that generative AI has an important role to play in the future of drug development, especially in a time of crisis.  

“It took time to develop and validate these methods, but now that we have a working pipeline in place, we can generate results much faster,” said study co-senior author, Payel Das, a researcher at IBM Research. “When the next virus emerges, generative AI could be pivotal in the search for new treatments.”  

“Generating initial compounds that bind with high affinity to a drug target of interest accelerates the structure-based drug discovery pipeline and underpins our efforts to be better prepared for future pandemics”, said, Martin Walsh, who was co-senior author at Diamond  

The researchers built their model, Controlled Generation of Molecules (or CogMol), on a generative AI architecture known as variational autoencoders, or VAEs. VAEs encode raw data into a compressed representation, and then decode, or translate, it back into a statistical variation on the original sample.  Their model was trained on a large dataset of molecules represented as strings of text, along with general information about proteins and their binding properties. But they deliberately left out information about SARS-CoV-2’s 3D structure or molecules known to bind to it. Their goal was to give their generative foundation model a broad base of knowledge so that it could be more easily deployed for molecular design tasks it has never seen before.  

Their goal was to find drug-like molecules that would bind with two Covid protein targets: the spike, which transmits the virus to the host cell, and the main protease, which helps to spread it. Though the 3D structures of both proteins had been discovered by that time, the IBM researchers chose to use only their amino acid sequences, derived from their DNA. By limiting themselves in this way, they hoped that the model could learn to generate molecules without knowing the shape of their target.  

The researchers input  only the amino acid sequence for each protein target into CogMol, which generated 875,000 candidate molecules in three days. To narrow the pool, the researchers ran the candidates through a retrosynthesis platform, IBM RXN for Chemistry, to understand what ingredients would be needed to synthesise the compounds. Based on the platform’s predicted recipes, they selected 100 molecules for each target. Chemists at Enamine further pared the list to four molecules for each target, selecting those deemed easiest to manufacture.  

After synthesising the eight novel molecules, Enamine shipped them to Oxford for testing their ability to disrupt the functions of the two protein targets in the labs of Prof Chris Schofield and PRof Gavin Screaton. . The intense X-ray beam generated from Diamond which are 10 billion times brighter than the sun were used to visualise how the compounds interacted with proteins to inactivate their function,. The novel compounds were further tested in target inhibition and live virus neutralization tests. Two of the validated antivirals target the main protease; the other two not only targeted the spike protein but proved capable of neutralising all six major Covid variants.  “You get a map that shows exactly where things bind, and bang! you’ve got a confirmation,” said Stuart. 

CogMol is one of several chemical foundation models that IBM has since developed. The largest, MoLFormer-XL, was trained on a database of more than 1.1 billion molecules and is currently being used by Moderna to design mRNA medicines. “We created valid starting points for accelerated development of antivirals using a generative foundation model that knew relatively little about its protein targets,” said the study’s co-senior author, Jason Crain, a researcher at IBM Research and professor at Oxford. “I’m hopeful that these methods will allow us to create antivirals and other urgently needed compounds much faster and more inexpensively in the future.” 

Though the researchers focused on validating antivirals for Covid, they argue that these methods can be extended to existing viruses that continue to mutate, like the flu, or viruses that have yet to surface. “If you want to be prepared for the next pandemic, you want drugs that act on different sites of the protein,” concluded Stuart. “It becomes much harder for the virus to escape.” 

ENDS 

Science Advances; 21 Jun 2023, Vol 9, Issue 25  DOI: 10.1126/sciadv.adg7865  “Accelerating drug target inhibitor discovery with a deep generative foundation model” 

Authors: Vijil Chenthamarakshan, Samuel C. Hoffman , C. David Owen https://orcid.org/0000-0001-5774-8202, Petra Lukacik, Claire Strain-Damerell Daren Fearon ,Tika R. Malla, Anthony Tumber, Christopher J. Schofield, Helen M.E. Duyvesteyn, Wanwisa Dejnirattisai, Loic Carrique, Thomas S. Walter, Gavin R. Screaton, Tetiana Matviiuk, Aleksandra Mojsilovic, Jason Crain, Martin A. Walsh  , David I. Stuart , and Payel Das  

For more information: please contact Diamond Communications: Lorna Campbell +44 7836 625999 or Isabelle Boscaro-Clarke +44 1235 778130   Diamond Light Source: www.diamond.ac.uk  Twitter: @DiamondLightSou    

Editors Notes/additional info:-  

Drug resistance and preparing for future threats  

Developing new drugs is notoriously slow, often taking a decade or more. During the pandemic, an unprecedented collaboration between researchers in academia and industry worldwide brought new treatments to the market in record time. But an important factor that contributed to their success was the drugs themselves; most had already been approved for other uses and could be quickly repurposed for Covid-19.  

In the future, new drugs may be required to tackle new viruses. Viruses mutate, and as they change shape, the drugs designed to block them become less effective. Some of the anti-Covid therapies developed early in the pandemic no longer work, said Stuart, and it’s likely that as SARS-CoV-2 continues to mutate, it will become resistant to others. 

Generative AI could provide an answer, with its ability to create molecules entirely new to nature. Two of the AI-generated Covid antivirals the researchers found bind to the virus’s spike protein in a distinctly new way. If developed into drugs, they could potentially complement some of today’s Covid antivirals in the same way that HIV today is treated with a cocktail of drugs targeting different receptors. 

How traditional drug discovery works 

Typically, the drug discovery process starts by identifying a biological target, like a protein, that plays a key role in disease. Medicinal chemists then search for compounds that can bind to the target and disrupt its activity. 

The hunt often begins with high-throughput screening, which involves filtering vast libraries of small, drug-like molecules for promising candidates deemed likely to bond to the target. Once hits are identified, the molecules are refined into more drug-like “leads” by making the molecules more soluble and stable and removing any toxic ingredients.  

Fewer than one in 100 compounds make it to the “hit” stage, and even fewer progress further. The odds are better with a newer technique known as fragment-based screening, which focuses on finding molecular pieces that are likely to bind to the target. Diamond has pioneered the application of X-ray fragment based screening at its XChem platform which harnesses the high-throughput capabilities of Diamond’s MX beamlines and associated state-of-the-art sample preparation laboratories. Using this technique  so called ‘hits’ where a fragment binds to the target protein can be quickly found and elaborated into potential lead compounds as starting points for drug discovery and ultimately   built into full-sized, drug-like molecules. Several anti-Covid compounds found this way are currently in pre-clinical trials, Stuart said. 

In the study, the researchers showed that the hit rate can be  g raised to as high as 50% by combining generative AI with retrosynthesis prediction, a way of automatically working out the chemical ingredients and reactions needed to manufacture a given molecule to estimate its production cost. Moreover the starting point for these hits are with more complex compounds with higher affinity binding which speeds up the drug discovery process 

Diamond Light Source : provides industrial and academic user communities with access to state-of-the-art analytical tools to enable world-changing science. Shaped like a huge ring, it works like a giant microscope, accelerating electrons to near light speeds, to produce a light 10 billion times brighter than the Sun, which is then directed off into 33 laboratories known as ‘beamlines’. In addition to these, Diamond offers access to several integrated laboratories including the world-class Electron Bio-imaging Centre (eBIC) and the Electron Physical Science Imaging Centre (ePSIC).    

Diamond serves as an agent of change, addressing 21st century challenges such as disease, clean energy, food security and more. Since operations started, more than 16,000 researchers from both academia and industry have used Diamond to conduct experiments, with the support of approximately 760 world-class staff. Almost 12,000 scientific articles have been published by our users and scientists.    

Funded by the UK Government through the Science and Technology Facilities Council (STFC), and by the Wellcome Trust, Diamond is one of the most advanced scientific facilities in the world, and its pioneering capabilities are helping to keep the UK at the forefront of scientific research.    

Diamond was set-up as an independent not for profit company through a joint venture, between the UKRI’s Science and Technology Facilities Council and one of the world’s largest biomedical charities, the Wellcome Trust - each respectively owning 86% and 14% of the shareholding.    

 


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.