News Release

Plant genomic resources at National Genomics Data Center: Assisting in data-driven breeding applications

Peer-Reviewed Publication

Beijing Zhongke Journal Publising Co. Ltd.

Database resources of plants in NGDC-CNCB

image: 

Overview of database resources and application tools for plants in the CNCB-NGDC.

view more 

Credit: Beijing Zhongke Journal Publising Co. Ltd.

This review paper is led by Professor Shuhui Song and Zhang Zhang(National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, China). The article systematically summarizes the plant genomics database resources of the CNCB-NGDC from three aspects: central repositories dedicated to archiving, presenting, andsharing plant omics data, knowledgebases focused on variants or gene-based functionalinsights, and species-specific multiple omics database resources. A brief overview of the online application tools at the CNCB-NGDC is also presented..

 

The development of  repositories for collecting and archiving multi-omics data is of great significance for the long-term preservation and open sharing of data. Since 2016, CNCB-NGDC has gradually established several core archiving repositories,including Genome Sequence Archive (GSA), Genome Warehouse (GWH), GenBase, Genome Variation Map (GVM), and Open Plant Image Archive (OPIA). These repositories supportworldwide data submission, archiving, preservation, and sharing. As of August 2023, GSA has archived nearly 4,500 TB of raw sequencing data from 1,850 plant species; GWH has hosted 10,594 genome assemblies from 1,423 plant species; GenBasehas assembled 1,085 protein sequences and 1,024 nucleotide sequences related to plants; GVM has collected genetic variation data of34,643 samples from 30 plant species; OPIA has collected566,255 plant images from 11 plant species.Based on the archived data resources, NGDC has further conducted in-depth analyses at variousomics levels and established a series of information repositories, such as GVM for nuclear genome sequence variations, CGIR forchloroplast genome variations,PlantPanforplant pan-genomes, GEN for gene expression profile, and MethBankfor DNAmethylation maps.

 

Knowledge of variation or gene function is useful in understanding the molecular mechanisms of complex phenotypic traits. Through high-quality literature curation and systematic organization, NGDC has built a series of knowledgebases on plant genes or genetic variations. Among them, the GWAS Atlas includes 269,138 pieces of genomic variation-phenotype association knowledge for 10 plant species; LSD provides information on genes, mutants, phenotypes, and references related to leaf senescence; PED integrates 98 RNA editing factors and 20,836 editing events from 1,621 plant species; ICG summarizes 1,216 experimentally validated high-quality internal reference genes from 278 plant species, linked to 660 corresponding experimental scenarios.

 

In addition, NGDC has also established several staple or economic crops-specific integrated resources. IC4R is a multi-omics database for rice, providing high-quality annotation information for 56221 coding protein genes, 6259 non-coding RNAs, and 4373 circular RNAs; SorGSD provides 39547621 genomic variations from 289 sorghum varieties, as well as phenotypic characteristics and panicle imagesof critical sorghum lines; SoyOmics is a multidimensional omics resource for soybean, including genomic and pan genomic sequences of 27 soybean varieties, approximately 38 million genomic variations, gene expression profiles of multiple tissues, and 115 phenotypic data; TCOD integrates genomic sequences, functional genes, genomic variations, gene expression, and germplasm information of 15 tropical crops, including cassava, rubber tree, coffee,cocoa, and others.

 

Finally, the article briefly introduces the data search engine BIG-Search and the Bioinformatics Toolkits (BIT). BIG-Search is a distributed and scalable full-text search engine for a large number of biological resources, providing one-stop, cross-database search services for the global research community. BIT is a platform that integrates a great variety of tools that can be used for sequence alignment, compositional analysis, RNA expression, epigenome analysis, haplotype network construction, and data visualization.

 

See the article:

Plant genomic resources at National Genomics Data Center: assisting in data-driven breeding applications

https://link.springer.com/article/10.1007/s42994-023-00134-4


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.