15th October 2020, Hong Kong: Published today in the journal GigaScience is a new open source, cloud-based tool called IDseq that makes it possible to rapidly detect, identify, and track emerging pathogens such as SARS-CoV-2. This tool can identify pathogens before there is an available complete genome sequence; thus, it can be used for current infectious disease outbreaks and also for emerging ones. This will substantially aid in preventing future pandemics.
The coronavirus pandemic demonstrates the importance of global infectious disease monitoring. Finding the cause of an infectious disease outbreak is challenging, especially if it stems from a previously unknown pathogen. IDseq, an open source, cloud-based metagenomic analysis platform, identifies both novel and existing disease-causing pathogens from a given sample -- be it a human, animal, or parasite -- to provide an actionable report of what is happening on the ground in labs and clinics anywhere in the world.
"IDseq can be thought of as an early warning radar for emerging or novel infectious agents," said Joe DeRisi, PhD, Co-President of the Chan Zuckerberg Biohub, who contributed to the identification of the SARS coronavirus in 2003 and whose research lab at the University of California, San Francisco initiated the IDseq tool. It is designed to enable the global health community to leverage the ever-decreasing cost of sequencing for tracking and identifying infectious disease in essentially any sample. "At the beginning of the coronavirus pandemic, researchers in Cambodia used IDseq to help confirm and sequence the whole genome of the country's first case of COVID-19 in a matter of days, and in California, we're providing critical SARS-CoV-2 genomic data to public health officials to inform contact tracing and intervention strategies."
In a study published in GigaScience, scientists use various approaches to demonstrate that the IDseq tool is indeed able to reliably identify emerging pathogens, among them, as proof of principle, a nasal swab from a COVID-19 patient in Cambodia. A partnership between the Chan Zuckerberg Biohub, the Chan Zuckerberg Initiative (CZI), and the Bill and Melinda Gates Foundation enabled these researchers to sequence and confirm the country's first case of COVID-19 in a matter of days -- not the weeks it could typically take. The results demonstrate that IDseq can detect the presence of an emerging pathogen prior to the existence of a full reference genome. IDseq also now contains a new workflow for building SARS-CoV-2 consensus genomes.
"Metagenomic sequencing (mNGS) is an incredibly useful tool for pathogen detection because of its highly sensitive and hypothesis-free nature," said Katrina Kalantar, Computational Biologist at CZI. "We've seen labs that are using IDseq for existing mNGS studies rapidly pivot their focus to more targeted sequencing of SARS-CoV-2, which has helped researchers better understand coronavirus transmission patterns."
In Cambodia, researchers uploaded the genome sequence to open source pathogen data repository GISAID (Global Initiative on Sharing All Influenza Data) and to Nextstrain, so scientists anywhere can see the full genome sequence of the SARS-CoV-2 coronavirus and study it within the broader context of SARS-CoV-2 coronavirus sequences uploaded globally. Researchers at the Cambodian National Center for Parasitology, Entomology and Malaria Control (CNM) and the National Institute of Allergy and Infectious Diseases (NIAID) partnered with the Institut Pasteur Cambodia to complete this research. These researchers are one of several teams around the world receiving molecular biology and bioinformatics training from the infectious disease team at the Biohub; free access, training, and compute on the IDseq platform from CZI; and the necessary equipment and supplies to begin work in their own countries through the Grand Challenges Explorations Grants.
Unlike tests that are specific for a known agent, such as the SARS-CoV-2 PCR test, mNGS is a universal method that can detect novel disease-causing pathogens, which can be especially useful in cases where researchers may not know what is causing an infection, or what pathogens are circulating in a particular area. A mNGS experiment starts with mass-amplifying DNA traces of pathogens from a patient's sample, resulting in millions of small bits of DNA sequences, or reads. This enormous dataset must then be analysed and interpreted using bioinformatic techniques. The aim is to assign individual DNA fragments from the clinical sample to specific pathogens by leveraging knowledge from sequence databases.
Analysing the massive amount of data from a typical mNGS experiment often requires a battery of specialized bioinformatic tools, including highly specialized expertise and expensive commercially licenced software -- making mNGS a hard-to-access method. The new user-friendly IDseq software is open source and freely available to the global health community, reducing the barrier of entry to metagenomics. Researchers can reuse and build upon the code, which works via a cloud-based service and a web application designed for collaboration and data sharing. The pipeline starts with raw sequencing data as the input, and then goes through steps of filtering, quality control, alignment, and reporting and visualization.
For more information, visit IDseq.net.
Further Reading Kalantar KL et al., IDseq - An Open Source Cloud-based Pipeline and Analysis Service for Metagenomic Pathogen Detection and Monitoring. Gigascience. 2020;9(10):giaa085. doi:10.1093/gigascience/giaagiaa111
https://academic.oup.com/gigascience/article-lookup/doi/10.1093/gigascience/giaa111
Preprint available at https://www.biorxiv.org/content/10.1101/2020.04.07.030551v3
Contacts: Scott Edmunds, Editor in Chief GigaScience, BGI Hong Kong Email: scott@gigasciencejournal.com
Leah Duran, Communications Manager, CZI Email: lduran@chanzuckerberg.com
Sharing on social media? Find GigaScience online on twitter @GigaScience; Facebook https://www.facebook.com/GigaScience/, and keep up-to-date with our blog http://gigasciencejournal.com/blog/.
About GigaScience GigaScience is co-published by GigaScience Press and Oxford University Press. Winner of the 2018 PROSE award for Innovation in Journal Publishing (Multidisciplinary), the journal covers research that uses or produces 'big data' from the full spectrum of the biological and biomedical sciences. It also serves as a forum for discussing the difficulties of and unique needs for handling large-scale data from all areas of the life and medical sciences. The journal has a completely novel publication format -- one that integrates manuscript publication with complete data hosting, and analyses tool incorporation. To encourage transparent reporting of scientific research as well as enable future access and analyses, it is a requirement of manuscript submission to GigaScience that all supporting data and source code be made available in the GigaScience database, GigaDB, as well as in publicly available repositories. GigaScience will provide users access to associated online tools and workflows, and has integrated a data analysis platform, maximizing the potential utility and re-use of data.
###
About the Chan Zuckerberg Initiative
Founded by Dr. Priscilla Chan and Mark Zuckerberg in 2015, the Chan Zuckerberg Initiative (CZI) is a new kind of philanthropy that's leveraging technology to help solve some of the world's toughest challenges -- from eradicating disease, to improving education, to reforming the criminal justice system. Across three core Initiative focus areas of Science, Education, and Justice & Opportunity, we're pairing engineering with grant-making, impact investing, and policy and advocacy work to help build an inclusive, just, and healthy future for everyone. For more information, please visit http://www.chanzuckerberg.com.
About the Chan Zuckerberg Biohub
The Chan Zuckerberg Biohub is a nonprofit research organization setting the standard for collaborative science, where leaders in science and technology come together to drive discovery and support the bold vision to cure, prevent, or manage disease in our children's lifetime. The CZ Biohub seeks to understand the fundamental mechanisms underlying disease and to develop new technologies that will lead to actionable diagnostics and effective therapies. The CZ Biohub is a regional research endeavor with international reach, where the Bay Area's leading institutions -- the University of California, Berkeley, Stanford University and the University of California, San Francisco -- join forces with the CZ Biohub's innovative internal team to catalyze impact, benefitting people and partnerships around the world. To learn more, visit CZBiohub.org.
Journal
GigaScience