The study, which will be posted on the Nature Genetics website (http://www.nature.com/ng/) April 7 in advance of its publication in the journal, describes an effort to locate and precisely identify all of the approximately 19,000 genes that have been predicted to exist in the genome of C. elegans. The success of the technique in the worm, whose catalogue of genes is relatively small and well-mapped, is a strong indication that it can be applied to the human genome as well, the authors say.
"The completion of a 'rough draft' of the map of the human genome a couple of years ago was an important milestone in our understanding of how cells work," says the study's senior author, Marc Vidal, PhD, of Dana-Farber. "But the fact is, our current picture of the 'parts list' of the human genome is rather fuzzy. Computer programs have been used to predict the position and structure of genes. However, we don't know exactly where most genes begin and end, and there are literally thousands of gaps in our picture of how the building blocks of genes are arranged. Even the frequent claim that there are about 30,000 genes within the human genome is only an estimate."
Gaining a more accurate picture of the genome is crucial to future advances against cancer and other diseases. One of the goals of cancer research, for example, is to discover the complete set of proteins, the chief tools of cell life, that are involved in the disease, either directly or indirectly. To understand what such cancer-related proteins do and how they do it, scientists need to be able to generate pure samples of them. And because genes contain the instructions for producing proteins, it is vital that researchers know exactly what those instructions say, where they are, and how to obtain them.
The current, somewhat sketchy map of the human genome was created with computer programs that predict where genes are likely to be found on a cell's chromosomes, not necessarily where genes actually are, Vidal observes. "It's the difference between a soft-focus image of an object and a crisper image that shows the outline in detail."
Moreover, the current map also gives an incomplete view of genes' internal organization. Genes are often portrayed as a string of beads along the length of a chromosome. In fact, genes themselves are made up of alternating bands of DNA: sections known as exons, which contain coded information for producing proteins, alternate with sections known as interrupting introns. For a great many genes, scientists do not know exactly where these sections begin and end.
"Of the 30,000 genes believed to be in the human genome, only about 5,000 have been well defined," remarks Vidal, who is also an assistant professor of genetics at Harvard Medical School. "The structures of the other 25,000 have yet to actually be confirmed. Plus, there are long stretches of the chromosomes that remain terra incognita – that may contain genes yet to be discovered."
In their study, Vidal and his colleagues checked the accuracy of the map of the C. elegans genome using a technique of their own design. (Conventional techniques tend to miss less-active genes, and they don't offer a useful way of converting genetic information into proteins.)
The strategy involves blocks of genetic material called open reading frames, or ORFs. These include most of the exons in a gene, after the introns have been removed or spliced out. "ORFs are the actual blueprints for proteins," Vidal explains. "Introns don't seem to participate in the actual make-up of proteins."
C. elegans produces ORFs as a natural part of living, as do all creatures. When a cell creates a protein, the recipe – or ORF – for it is transferred from a gene to RNA, which carries it to the cell's protein-making area. By capturing the RNA within C. elegans cells and converting it into complementary DNA (known as cDNA), investigators were able to gather the creature's full set of ORF instructions. The segments of cDNAs – representing each of the animal's genes – were then compared to the sections of chromosome thought to contain those genes.
Investigators used this technique to examine all 19,000 predicted worm genes. They found that in more than half the cases – 56 percent – the predicted genes did not completely match the actual genes isolated in their study.
"This demonstrates that even in C. elegans – whose genome is better understood than humans' – the genome map needs a great deal of correcting and refining," Vidal says. "We're still a long way from having a perfect picture of the parts list encoded by the worm genome, let alone the human one."
The technique developed at Dana-Farber has a further benefit for scientists in the emerging field of proteomics, which deals with the complete set of proteins produced by cells. By isolating each of the ORFs in the worm's genome, reproducing them millions of times, and inserting them in tiny structures that convert their molecular information into proteins, it is possible to collect large, pure samples of the proteins for study.
In the case of genes that are active at relatively low levels, Vidal says that it has been nearly impossible to collect enough of their associated protein to study. The new method, however, makes it possible to collect equal amounts of all the proteins produced by a cell.
"The success of this technique with C. elegans suggests that it can be equally successful with the genomes of other creatures, including humans," saysVidal. "It brings us closer to a completely accurate map of the human genome. And it lays the groundwork for another map – the human proteome, the list of all proteins produced by human cells."
Contributing to the study were researchers at the Unite de Recherche en Biologie Moleculaire, in Belgium; Research Genetics, in Huntsville, Ala.; Life Technologies, of Rockville, Md.; Protedyne Corp., of Windsor, Conn.; McGill University; the Public Health Research Institute, in Newark, N.J.; Yale University; Albert Einstein College of Medicine; and Genome Therapeutics, of Waltham, Mass. The work was supported by grants from the National Cancer Institute, the National Human Genome Research Institute, the National Institute of General Medical Sciences, and the Merck Genome Research Institute.
Dana-Farber Cancer Institute is a principal teaching affiliate of the Harvard Medical School and is among the leading cancer research and care centers in the United States. It is a founding member of the Dana-Farber/Harvard Cancer Center (DF/HCC), designated a comprehensive cancer center by the National Cancer Institute.
Journal
Nature Genetics