Article Highlight | 7-Apr-2025

TOAnnoPriDB: an integrative database for trans-omic annotation and prioritization of non-coding variants across human genome

Science China Press

In recent years, the development of large-scale sequencing projects has identified numerous genomic variants in the human genome. For instance, the NyuWa genome resource (Cell Reports, 2021), led by Professor Tao Xu and Professor Shunmin He, identified 71.06 million SNPs and 8.19 million InDels, with over 98% being non-coding variants. Due to challenges in bioinformatic annotation and functional studies, the role of non-coding variants in human diseases remains largely unexplored. Understanding the relationship between non-coding variants and human diseases is a significant and challenging research area. Furthermore, the application of multi-omics techniques has enriched genomic variation annotation, but the large volume and diverse formats of these data hinder their accessibility and application.

Building on the NyuWa genome resource, TOAnnoPriDB provides comprehensive annotations for non-coding variants, emphasizing their functional effects and associations with human diseases. Integrating data from 147 public resources, TOAnnoPriDB covers approximately 98% of the human genome's non-coding regions (Figure 1). The database includes 46 classes of trans-omics annotations, spanning variant frequency, functional predictions, disease associations, regulatory elements, QTLs, molecular interactions, gene expression, and small peptides.

The NyuWa Genome Project has systematically characterized genomic variations, including SNPs/InDels (Cell Reports, 2021), mobile element insertions (Nucleic Acids Research, 2022), short tandem repeat polymorphisms (Nature Communications, 2023), and variable number tandem repeat polymorphisms (Cell Genomics, 2024). Recent studies have also examined positive selection signals (Science Bulletin, 2023) and adaptive selection of non-coding regulatory elements (Molecular Biology and Evolution, 2024) in the human genome.

TOAnnoPriDB prioritizes variants using a framework that incorporates variant frequency, functional prediction scores (e.g., ncER, ReMM, GWAVA), regulatory elements (e.g., TFBS, DNase peaks, histone modifications), molecular interactions, translational potential, and variant-gene-disease associations. Variants are ranked from Level-1 to Level-6, with higher levels supported by multiple evidence types, guiding further research directions.

Additionally, TOAnnoPriDB provides a user-friendly web interface (http://bigdata.ibp.ac.cn/TOAnnoPriDB) for searching and analyzing variants (Figure 2). JBrowse 2 is integrated for annotation visualization, including allele frequencies, gene expression, and molecular interactions.

In summary, TOAnnoPriDB integrates trans-omics data, disease information, and population-specific variant frequencies. It offers a robust framework for prioritizing variants and a convenient interface for searching and analysis. TOAnnoPriDB is a powerful tool for variant screening and research, providing valuable insights into the role of non-coding variants in human diseases.

 

See the article:

https://doi.org/10.1016/j.scib.2024.12.030

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.