Public Release: 

A Genetic Skeleton Key

University of Southern California

There is a new new method for finding a human gene if an analogous gene from any other life form is already known.

While other techniques already exist to find cross-species gene analogs, this method is far more accurate. The new method -- devised through the collaborative efforts of U.S. and Russian researchers working at USC -- is likely to find ready applications in biotechnology, evolutionary biology and medical research.

The researchers, two in Russia and one in the United States, describe the method in the Aug. 20 issue of the Proceedings of the National Academy of Science.

"Hunting for human genes is a massive, painstaking undertaking that typically takes years and costs tens or even hundreds of millions of dollars," said co-author Pavel A. Pevzner, a professor of mathematics and computer science. With this method, we can find a human gene if an analogous gene from another species has been identified. The species doesn't matter: mouse, chicken, frog. Anything alive can serve as a template to find human genes."

Many cancer-causing genes already identified in mice and other laboratory animals are thought to have analogs that cause cancer in humans. This animal research can now be translated far more quickly into human gene sequences, and ultimately, it is hoped, into treat-ments and cures."

Pevzner and his Russian collabora-tors -- Mikhail S. Gelfand, of the Institute of Protein Research at the Russian Academy, and Andrey A. Mironov, of the Laboratory of Mathematical Methods at the Russian National Center for Biotechnology -- have devised a method that is able to overcome formidable obstacles.

In very simple life forms, such as bacteria, genes are written into the organism's hereditary material as continuous strings of information, recording genetic information in the four-letter base-pair AGCT alpha-bet of DNA.

In man and other multi-celled organisms, the situation is much less straightforward, even though exactly the same alphabet of base-pair "letters" is used. A human gene, consisting of a message roughly 2,000 letters long, is typi-cally broken into submes-sages called "exons."

These exons are shuffled, seemingly at random, into a section of chromosomal DNA as many as a million letters long.

A typical human or mammalian gene can have 10 exons or more.

Recently, scientists reported a gene written in 54 separate exons. Another, linked to breast cancer, has 27.

"This situation is comparable to a magazine article that begins on page one, continues on page 13, then takes up again on pages 43, 51, 53, 59, 70, 74, 80 and 91, with pages of advertising and other articles appearing in between," Pevzner said. "We don't understand why these jumps occur or what purpose they serve. Thankfully, like a maga-zine, the exons stay in order. They don't jump backward. You always read in the same direction."

The jumps are inconsistent from species to species. An "article" in an insect edition of the genetic magazine will be printed differently than the same article appearing in a worm edition. "The pagination will be completely different," Pevzner explained, "and it will not be con-sistent:

The information that appears on a single page in the human edition may be broken up into two in the wheat version, or vice versa."

Pevzner noted yet another complication. "The genes themselves, while related, are quite different. The mouse-edition gene is written in mouse language; the human-edition gene in human language. It's a little like German and English, which are related languages: many words are identical or similar, but many others are not. Nevertheless, to find the analogous genes, we must be able to recognize these differently spelled words written on different pages as the same message."

Even there, the complications do not end. If it were just a matter of picking out a known "magazine story," whether in mouse-DNA language or human-DNA language, from intervening material that was obviously advertising, the problem would be far less difficult. Perversely, the "advertising" can mimic the message. Long sequences of "junk DNA," as it's sometimes called, may be identical to parts of the message but not be part of the gene. Such sequences are meant to be skipped when the message is read.

Earlier methods for deciding what is advertising and what is story depended on statistics. "To continue the magazine analogy, it is something like going through back issues of the magazine and finding that human-gene 'stories' are less likely to contain phrases such as 'For Sale,' telephone numbers, and the dollar sign," Gelfand explained.

While better than random reconstruction, these statistical methods are inaccurate at best.

The method developed by Pevzner and his colleagues zeros in on the proper pages that are potentially part of the "story" -- all pages that seem to have sequences that are part of the message.

The accuracy of the method developed by Pevzner and his colleagues is always good, the scientists reported, and often remarkable -- 99 or 100 percent accurate.

The Proceedings paper contains a listing of trials of the method on nearly 100 different genes, 47 of them from mammals (mostly mice), 45 from other organisms, including bacteria. For mammals, 40 of 47 reconstructions were perfect -- 100 percent accurate. In six of the remaining cases, where the method did not give a perfect prediction, it came close, accurately predicting 94 to 97 percent.

Even the lone case in which the method seemed to fall down -- predicting with 75 percent accuracy on the basis of mouse data -- the failure was interesting. In this case, chicken data for the same gene were also available to use for predictions, and the prediction of the human gene from the chicken data was 100 percent accurate. "This is surprising, given that we think of humans as more closely related to mice than to chickens," Pevzner notes.

Even when the starting point of the reconstruction was target material from organisms evolutionarily extremely different from humans -- bacteria, yeasts and others -- 25 of the reconstructions were 100 percent accurate

"We believe our method will prove extremely useful to researchers, not just in biotechnology, but also evolutionary biology," Pevzner says. "It will enable biologists to trace, with exceptional precision, exact degrees of difference between gene organization in different species. And it will help to establish evolutionary relationships between species."

The research was supported by grants from the U.S. Department of Energy, the Russian Fund for Fundamental Research, the Russian Human Genome Program, and the National Science Foundation's Young Investigator Program.

###


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.