News Release

Scientists discover way to streamline analysis of maize genome

Combination of two techniques can help identify 'gene islands' in the key crop

Peer-Reviewed Publication

The Institute for Genomic Research

Rockville, MD – Like tiny islands in a vast sea, the gene clusters in maize are separated by wide – and extremely difficult to decipher – expanses of highly-repetitive DNA. This complex structure has greatly complicated efforts to sequence the genome of maize, which is one of the world's most important crops.

In an effort to streamline the way that researchers identify and sequence the DNA in those gene-rich islands, scientists at The Institute for Genomic Research and collaborators have discovered that two different approaches to identifying the non-repetitive regions of the genome together provide a complementary and cost-effective alternative to sequencing the entire genomes of complex plants.

In a paper published in the December 19th issue of the journal Science, the researchers found that two independent gene-enrichment techniques – methylation filtering and High-C0t selection – target somewhat distinct but overlapping regions of the genome and therefore could be used together to help identify nearly all of the genes in maize as well as their genomic structures.

This finding is significant because the maize genome, which includes about 2.5 billion base pairs of DNA, is about 20 times larger than the first plant genome to be deciphered, Arabidopsis thaliana, and nearly six times larger than the rice genome. The reason that the maize genome is so large is that approximately 80% consists of families of nearly identical repetitive sequences. The gene-containing sequences are concentrated in the remaining 20% of the genome.

The challenge for genomic researchers is to explore the gene-rich islands without having to negotiate through the sea of highly-repetitive DNA surrounding them. In the Science study, researchers reported on two "filtration" techniques that separate the gene-rich regions from the gene-poor ones, providing about a four-fold reduction in the amount of sequencing necessary to find all of the maize genes.

"A combination of these techniques may be an excellent method for sequencing maize as well as other large and complex plant genomes at a cost far lower than current approaches," says Cathy A.Whitelaw, the TIGR researcher who led the maize analysis project and is the first author of the Science paper.

The major collaborators for the study were the Donald Danforth Plant Science Center in St. Louis, MO.; the University of Georgia's genetics department in Athens, GA; and Orion Genomics, in St. Louis. The project was sponsored by the National Science Foundation's Plant Genome Research Program.

"The success of this project highlights the importance of virtual center projects in bringing together the expertise required to tackle large complex problems in genomics," says Jane Silverthorne, who leads the NSF's plant genome program.

TIGR Investigator John Quackenbush, the paper's senior author, says, "Maize is the single largest food crop in the United States, so developing strategies to decode its complex genome is a high priority. More importantly, the techniques that we have developed will be useful in the analysis of many other crops such as soybean whose genomes are also highly repetitive."

The two filtration techniques – methylation filtering and High-C0t selection – are not new, but this was the first time that they were tested together on a major scale, in this case with a combined total of about 93 million DNA base pairs from the maize genome.

The methylation filtering technique excludes hyper-methylated DNA sequences (a characteristic of highly-repetitive DNA) by means of bacterial restriction systems that cleave those areas of the genome. The technique was first developed by scientists at Cold Spring Harbor Laboratory.

The High-C0t selection technique, developed by researchers at the University of Georgia's genetics department, excludes highly-repetitive DNA sequences by using a different method that separates DNA segments into "low-copy" (High C0t) and "high-copy" (Low C0t) sequences, which correspond roughly to gene-rich and gene-poor sections of the genome, respectively.

When researchers analyzed the composition of Simple Sequence Repeats – short, repetitive segments of two, three or four DNA bases – recovered from the two techniques, they were able to show that the filtration methods targeted different regions of the maize genome.

An analysis of "genetic markers" – sequences related to the maize genetic map – reinforced that conclusion and further indicated that these methods do not have significant biases, as newly-sequenced regions are evenly distributed across the 10 maize chromosomes.

"While both of these methods increase the rate of gene identification from maize genomic sequence, our analysis implies that they have biases; this suggests that both methods are required to ensure comprehensive coverage of the maize gene space," says W. Brad Barbazuk, Ph.D., senior bioinformatics specialist at the Danforth Center.

TIGR's President, Claire M. Fraser, calls the maize study is an important step in tackling the genomes of complex plants: "Not only has this project given us a preview of the structure of the maize genome, it also has helped us find a rapid and cost-effective alternative to sequencing the entire genome."

###

The Institute for Genomic Research (TIGR) is a not-for-profit research institute based in Rockville, Maryland. TIGR, which sequenced the first complete genome of a free-living organism in 1995, has been at the forefront of the genomic revolution since the institute was founded in 1992. TIGR conducts research involving the structural, functional, and comparative analysis of genomes and gene products in viruses, bacteria, archaea, and eukaryotes.

TIGR Media Contact:
Robert Koenig, TIGR Public Affairs Manager
301-838-5880
rkoenig@tigr.org

TIGR Scientific Contact:
John Quackenbush, Ph.D.
johnq@tigr.org


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.