News Release

Landmark comparative genomics study highlights the importance of 'junk' DNA in higher eukaryotes

Peer-Reviewed Publication

Cold Spring Harbor Laboratory

SANTA CRUZ, Calif., Fri., July 15, 2005 – A ground-breaking comparative genomics study appears online today in the journal Genome Research. Led by Adam Siepel, graduate student in Dr. David Haussler's laboratory at the University of California, Santa Cruz, the study describes the most comprehensive comparison of conserved DNA sequences in the genomes of vertebrates, insects, worms, and yeast to date.

One of their major findings was that as organism complexity increases, so too does the proportion of conserved bases in the non-protein-coding (or "junk") DNA sequences. This underscores the importance of gene regulation in more complex species.

The manuscript also reports exciting biological findings regarding highly conserved DNA elements and the development of a new computational tool for comparing several whole-genome sequences. It was authored by multiple investigators from leading research institutions, including Penn State University (University Park, PA), Washington University School of Medicine (St. Louis, MO), Baylor College of Medicine (Houston, TX), and the University of California, Santa Cruz.

One of the most powerful approaches for pinpointing biologically relevant elements in genomic DNA is to identify sequences that are similar across multiple species. Such approaches are particularly useful for analyzing non-protein-coding sequences – sometimes called "junk" DNA. Although "junk" DNA is poorly understood, the increasing availability of whole-genome sequences is rapidly enhancing the ability of scientists to ascertain the biological significance of these non-protein-coding regions.

"Looking for functional elements in mammalian and other vertebrate genomes is like looking for needles in a haystack," explained Siepel. "By focusing on conserved elements, you get a much smaller haystack. It's not guaranteed to have every needle in it, and not everything in it is a needle, but you're much more likely to find a needle if you look in this smaller haystack than if you look in the big one."

Siepel's team aligned whole-genome sequences for four groups of eukaryotic species (vertebrates, insects, worms, and yeast). The vertebrates included human, mouse, rat, chicken, and pufferfish, and the insects included three species of fruit fly and one species of mosquito. Two worm species and seven yeast species rounded out the set.

To help ease the gargantuan task of identifying conserved elements in multiple alignments of whole-genome sequences, the researchers developed a new computational tool called phastCons. In contrast to traditional tools that compute conservation levels based on sequence similarity at each nucleotide position, phastCons allows for multiple substitutions per site, accounts for unequal rates of substitutions for different nucleotides, and considers the phylogenetic relationships of the species involved.

After applying phastCons to multiple alignments of each of the four groups of eukaryotic species, the researchers estimated that only between 3-8% of the human genome was conserved in the other vertebrate species. On the other hand, the more compact genomes of insects were more highly conserved (37-53%), as were those of worms (18-37%) and yeast (47-68%).

The scientists also observed that the proportion of conserved sequences located outside of protein-coding regions tended to increase with genome length and with the species' general biological complexity.

Most strikingly, the researchers discovered that two-thirds or more of the conserved DNA sequences in vertebrate and insect species were located outside the exons of protein-coding genes, while non-protein-coding sequences accounted for only about 40% and 15% of the conserved elements in the genomes of worms and yeast, respectively.

"The conserved noncoding story seems to be fairly similar in vertebrates and insects, but looks quite different in worms and yeast," explained Siepel. "These findings support the hypothesis that increased biological complexity in vertebrates and insects derives more from elaborate forms of regulation than from a larger number of protein-coding genes." He noted that the results for the worm group should be interpreted cautiously because the analysis was based on the genomes of only two quite divergent worm species.

"We still understand remarkably little about the function and evolutionary origin of these elements," Haussler added. But the locations of the conserved elements will provide the scientists with some key clues to the potential functions of these sequences.

Some of the strongest sequence conservation in vertebrates was observed in the 3' untranslated regions (3'UTRs) of genes, which indicates that post-transcriptional regulation may be a widespread and important phenomenon in more complex species. The scientists found positive associations between highly conserved elements (HCEs) in known genes and RNA editing, as well as between HCEs and microRNA targets.

Interestingly, the researchers discovered that many HCEs in vertebrates may encode functional RNAs. The HCEs in introns and intergenic regions in vertebrates were significantly enriched for statistical evidence of local RNA secondary structure, which indicates that many may function as RNA genes.

"There really does seem to be a lot more going on at the RNA level than people would have guessed a few years ago," commented Siepel.

HCEs were also associated with "gene deserts" – long regions of the genome that are devoid of protein-coding genes. This indicates that some of the conserved elements may function as long-range transcriptional regulatory elements.

For genomic scientists, the current study is a major contribution to the field. Not only will the new bioinformatics tool phastCons help researchers identify evolutionarily conserved DNA elements, the reported conserved elements are represented as conservation tracks in the widely used UCSC Genome Browser. "With phastCons and with the conservation tracks in the browser," says Siepel, "we're trying to make it as easy as possible for researchers to home in on functionally important DNA sequences."

###


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.