News Release

Scientists report first complete genome sequence of a plant

Peer-Reviewed Publication

Cold Spring Harbor Laboratory

An international effort to sequence the entire genome of the plant species Arabidopsis thaliana is now complete. This first-ever complete genome sequence from a plant has many implications for biology, medicine, agriculture, and the environment because it will enable detailed studies of the entire genetic structure of plants to be carried out. Such studies will yield a great deal of new information about the gene products that are involved in many aspects of plant growth and development, and how these gene products carry out their functions.

Despite its status as a diminutive relative of the mustard plant, Arabidopsis thaliana is a powerful tool in plant molecular biology and genetics. The short generation time and relatively compact genome of Arabidopsis (a flowering plant) make it an ideal model system for understanding numerous features of plant biology, including ones that are of great pharmaceutical or agricultural value.

The sequencing studies, reported in the December 14, 2000, issue of the journal Nature, provide new information about chromosome structure, evolution, and gene organization in plants. Among the many new genes discovered were several involved in disease resistance and intracellular signaling, as well as homologs of a number of human disease genes. Perhaps the most surprising result of these studies, and related studies published last year (see below), is the extent to which vast chromosomal regions have been duplicated in the Arabidopsis genome. In fact, the new study indicates that the evolution of Arabidopsis involved a whole-genome duplication, followed by gene loss and additional, extensive local gene duplications.

The Arabidopsis genome was found to contain 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of the fruit fly Drosophila and the soil nematode worm C. elegans—two other multicellular organisms whose genomes have been completely sequenced. However, Arabidopsis has many plant-specific families of proteins (e.g. transcription factors) and lacks several kinds of proteins common to vertebrates, Drosophila, and C. elegans (e.g. the signalling pathway proteins Wingless/Wnt, Hedgehog, Notch/lin12, JAK/STAT, TGF-beta/SMADs).

"We are several years ahead of schedule," says Cold Spring Harbor Laboratory scientist W. Richard McCombie, referring to the progress that the international Arabidopsis Genome Initiative has made toward its goal of completing the sequencing project. "Throughout this endeavor, all of the groups involved have worked hard to share information, and that has made all the difference."

The Arabidopsis genome contains approximately 125 million base pairs of DNA (125 Mb) distributed among five chromosomes. One year ago, a U.S. consortium lead by McCombie reported the DNA sequence of chromosome 4 in collaboration with The European Union Arabidopsis Genome Sequencing Consortium lead by Michael Bevan of the John Innes Centre (Norwich, UK). Cold Spring Harbor Laboratory scientist Robert Martienssen was instrumental in organizing the international sequencing effort at its outset in 1996, and played a major role in interpreting the chromosome 4 results (see section entitled "Plant Biology at Cold Spring Harbor Laboratory" below).

"The completion of the Arabidopsis genome sequence has profound implications for human health as well as plant biology and agriculture," says Martienssen. In addition to McCombie and Martienssen, Ellson Chen of Perkin Elmer Biosystems based in Foster City, California, and Richard Wilson of the Washington University Medical School Genome Sequencing Center in St. Louis, Missouri, were principal investigators in the U.S. consortium that reported the chromosome 4 results last year.

A team of scientists at The Institute for Genomic Research (TIGR) in Rockville, Maryland, lead by J. Craig Venter determined the DNA sequence of Arabidopsis chromosome 2, which was reported in Nature last year together with the chromosome 4 results. The complete sequences of chromosome 2 (19 Mb) and chromosome 4 (17 Mb) represented roughly one-third of the plant's genome.

Today, the Arabidopsis Genome Initiative announces that it has completed the DNA sequence of the remaining chromosomes, which represent two-thirds of the entire genome. The principal teams in the new report are:

Chromosome 1
TIGR; Stanford Genome Technology Center; Plant Sciences Institute, University of Pennsylvania; Plant Gene Expression Center, UC Berkeley
Chromosome 3
European Union Arabidopsis Genome Sequencing Consortium; TIGR; Kazusa DNA Research Institute
Chromosome 5
Kazusa DNA Research Institute; The Cold Spring Harbor and Washington University in St. Louis Sequencing Consortium; European Union Arabidopsis Genome Sequencing Consortium

The major supporters of the U.S. sequencing effort were the National Science Foundation, the U.S. Department of Agriculture, and the U.S. Department of Energy.

The potential function of approximately 70 percent of the 25,498 genes of Arabidopsis can be predicted based on their similarity to other genes of known function in Arabidopsis or other organisms. However, the functions of the remaining 30% of Arabidopsis genes are unknown, and only 9% of Arabidopsis genes have been characterized experimentally. Future studies of the Arabidopsis genome and the proteins it encodes (particularly those with no known function) will be greatly facilitated by combining the new DNA sequence information with a multitude of existing genetic and molecular biological strategies and resources that are available to Arabidopsis researchers (for example, see the "gene trap" transposable element strategy described in the section entitled "Plant Biology at Cold Spring Harbor Laboratory" below).

McCombie says that the pace of the Arabidopsis sequencing project was accelerated by a first-of-its-kind effort to use high-throughput "whole-genome random BAC fingerprint analysis" to map a large eukaryotic genome in its entirety and provide an ordered set of DNA clones for sequencing (BAC, bacterial artificial chromosome). This analysis of the Arabidopsis genome was completed by Wilson, Marco Marra, and their colleagues at the Washington University Medical School Genome Sequencing Center with assistance from McCombie, Martienssen, and Larry Parnell of Cold Spring Harbor Laboratory. The random BAC fingerprinting technique has rapidly become the method of choice for mapping and sequencing the comparatively large genomes of other eukaryotic organisms, including humans. The human genome contains an estimated 3.2 billion base pairs of DNA, roughly 25 times more than Arabidopsis.

###

For streaming video about this story, visit:
http://www.cshl.org/meetings
http://www.nsf.gov/od/lpa/news/press/00/pr0094.htm
B-roll is also available

For more information about the Arabidopsis Genome Initiative, visit:
http://www.arabidopsis.org/agi.html

**************

Cold Spring Harbor Laboratory
Cold Spring Harbor Laboratory is a private, non-profit basic research and educational institution. Under the leadership of Dr. Bruce Stillman, a member of the National Academy of Sciences and a Fellow of the Royal Society (London), some 260 scientists conduct research in cancer, neurobiology, and plant genetics. Its other areas of research expertise include molecular and cellular biology, genetics, structural biology, and bioinformatics.

Plant Biology at Cold Spring Harbor Laboratory
Plant biology has a long and rich history at Cold Spring Harbor Laboratory. In 1908, George Schull found that by cross-pollinating corn plants, he could consistently produce higher yielding progeny. His theory of "hybrid vigor" has become widely known and has found many applications in agriculture and genetics, and is based on research Schull performed at CSHL. Barbara McClintock studied maize (corn) genetics here for fifty years, beginning in 1942. At Cold Spring Harbor Laboratory, McClintock discovered "controlling elements" which she found can switch other genes on and off as a consequence of their movement within the genome. In 1983, McClintock was awarded the Nobel Prize for her discoveries concerning controlling elements, later known as transposable elements, transposons, or "jumping genes."

In 1992, CSHL plant biologist Rob Martienssen and his colleagues published a "gene-trap" system for Arabidopsis based on McClintock’s transposable elements. Using this system, Martienssen created several thousand mutant strains of Arabidopsis, an invaluable resource for the study of gene expression during plant development. As a result of their collaborative effort to sequence and characterize these transposon insertions in the Arabidopsis genome, Martienssen and another CSHL scientist, W. Richard McCombie, were founding members of the Arabidopsis Genome Initiative—a global consortium established in 1996 to sequence the entire genome of Arabidopsis.

In 1999, Martienssen and McCombie reported the first complete DNA sequence of a plant chromosome — chromosome 4 — from Arabidopsis, in collaboration with The European Union Arabidopsis Genome Sequencing Consortium, led by Michael Bevan of the John Innes Centre. Simultaneously, scientists at TIGR completed the sequence of Arabidopsis chromosome 2.

The Plant Genomics Center at CSHL’s new Genome Research Center will support continuing studies of plant genome structure and function by Martienssen, McCombie and their CSHL colleagues. Information from these studies will be made widely available to individuals, scientists and industries with agricultural and environmental interests via databases maintained at the Center. This project is supported by the National Science Foundation.

For more information, visit the Laboratory's website, www.cshl.org, or call the Department of Public Affairs at 516-367-8455. Some of the material in this news release is identical to information included in a previous news release dated December 15, 1999 ("Scientists Report First Complete Sequence of Plant Chromosomes) issued when the complete sequence of Arabidopsis chromosomes 2 and 4 were published in the December 16, 1999 issue of Nature. A copy of that news release is available on request.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.