Human genomes representing diverse human populations reveal a new and diverse array of previously unrecognized structural variants, researchers report. The findings provide fundamental insights into the structure, variation and mutation of the human genome and provide a framework that generates reference genomes that represent the great diversity of our species more accurately. Many human genomes have been assembled and reported using short-read sequencing technology. However, this approach is limited by its ability to sequence long stretches of DNA. Because of this, it's difficult to use these data to identify structural variations within the highly complex and repetitive human genome and compare them among individuals and populations. However, recent advances in long-read sequencing technologies have significantly increased the sensitivity for resolving structural variants (SVs) - gene inversions, deletions, duplications and insertions larger than 50 base pairs (bp) in length. Whereas previous large-scale SV discovery efforts have been largely inferential and biased when it comes to detecting novel SVs, Peter Ebert and colleagues present a new method for identifying a wide variety of genetic variation directly by comparison of diverse haplotype-resolved human genomes. Using long-read and strand-specific sequencing technologies together, Ebert et al. assembled 64 haplotype-resolved human genomes from 32 individuals and identified nearly 108,000 SVs - 68% of which were previously undetected by short-read sequencing - and 278 SV hotspots. According to the authors, the new resource allows SVs to be much more accurately genotyped, illuminating new insights into SV genetic diversity that may impact gene function across human populations.
###
Journal
Science