News Release

Digital Speech Analysis Tests Sobriety

Peer-Reviewed Publication

Georgia Institute of Technology

Slurred speech is often a sure sign that someone's been drinking.

Now, a Georgia Institute of Technology researcher is working with colleagues from Indiana University to digitally quantify this telltale sign, which could lead to a simple, non-invasive way to test a person's sobriety.

"This is basically an effect of fine motor control," said Kathleen E. Cummings, a lecturer in Georgia Tech's School of Electrical and Computer Engineering. "We're looking at specifically what happens during speech production at your vocal cords, how steadily you can produce the excitation (air from your lungs) going through your vocal cords."

Preliminary results show that intoxicated speech is marked by jumpy changes in pitch and energy production and unsteady opening and closing of the vocal cords.

Cummings discussed her work May 16 at the 131st annual meeting of the Acoustical Society of America in Indianapolis. She is working with Dr. David B. Pisoni and Dr. Steven B. Chin of Indiana University, as part of their ongoing study of ways to measure how alcohol consumption affects speech. The current project is sponsored by the Alcoholic Beverage Medical Research Foundation.

Pisoni, director of Indiana's Speech Research Laboratory and a professor of psychology, is considered a leader in the study of acoustical analysis, synthesis and perception of speech. Chin is a psychology postdoctoral student specializing in linguistics.

The two researchers approached Cummings after hearing about her thesis work at Georgia Tech, published in 1992, on how speech changes when produced under emotional stress or with linguistic effects such as talking quickly or slowly, loudly or softly.

"Given her robust results in the differentiation of styles of stressed speech, we thought that this type of analysis might show characteristic changes in speech produced under alcohol," Chin said.

For her thesis work, Cummings used digitized speech collected from several people speaking in 11 of the most common non-normal styles of speech. She then spent several years analyzing the signals produced by the sounds, looking specifically at the glottal excitation waveform.

During speech production, air passes from the lungs through the glottis, an opening in the vocal cords, then is shaped into sounds by parts of the vocal tract, such as the teeth, tongue and lips. If the glottis stays open, the result is unvoiced sounds like "p" and "t." If it opens and closes periodically, voiced sounds, like "b," "z" and vowels are produced.

The glottal excitation waveform is the puffs of air produced by the opening and closing of the glottis during voiced speech. Cummings concentrated on voiced sounds in order to study the glottal excitation waveform, which is known to be important in the subtle parts of natural speech, such as emotion and style.

She discovered distinct differences between normal speech and that produced under emotional stress, with an accuracy rate of over 90 percent.

For her current research, Cummings said, "the idea is, can we do the same thing with sober versus intoxicated speech? If we have a sample of somebody's speech from an accident or at a particular time, can we analyze it and say, 'Yes, this person is intoxicated,' if we compare this to his normal, sober speech sample?"

To find out, Pisoni and Chin sent Cummings samples of sober and intoxicated speech from four different people, gathered at Indiana University. They include different types of speech, such as monosyllabic words, tongue twisters, isolated sentences and passages of connected sentences.

Samples were taken when participants were sober, moderately intoxicated (.05 percent blood alcohol level) and highly intoxicated (.10 percent blood alcohol level or higher, considered legally drunk in most states).

Past perceptual research on this database has shown that a person listening to the samples can reliably discriminate between sober and intoxicated speech. Acoustic analysis also has shown that intoxicated speech is slower, features longer sentences and is marked by mispronunciations, such as slurred sounds and transposed letters and words.

In the current study, Cummings is finding that alcohol has a major effect on the excitation parameters that reflect the steadiness with which a person produces speech.

Four speaker samples may not sound like enough for a comprehensive study, but Cummings said they form a sufficient database to make generalizations.

"If you see consistently the same trend between sober and intoxicated speech for four different speakers, that's actually a lot," she said.

Also, Cummings plans to continue her research on the other five speakers in the Indiana database.

Although much work is left to be done, Cummings said translating her research into a practical public safety device could be relatively easy. Law enforcement officials could record someone's speech at an accident or traffic stop, then analyze it later against a sample taken at a different time.

"If I can come up with a small set of parameters that differentiate sober and intoxicated speech, which I think I can do, it's actually not a hard task," she said. "There are some really simple distance measures that involve very few calculations."

The analysis would be done by computer, based on a mathematical formula that would yield a percentage probability as to whether the speaker was intoxicated.

The only stumbling blocks could be recording quality and legal issues, such as a person's refusal to give two samples for comparison.

"Now, if you ever got really, really lucky, and you found something that you only ever saw in intoxicated speech ... then you would be able to just do it on the fly," Cummings said. "But I haven't seen anything like that yet."

More importantly, researchers have to compare their results against other factors that alter the way a person speaks, such as speech impediments, injuries, diseases or even common colds. Ataxic dysarthria, for example, is a neurological condition that causes a person to sound intoxicated.

With more than a year's worth of work behind her and at least that much to go, Cummings hopes to soon isolate a distinct set of parameters that define intoxicated speech with a least 90 percent accuracy. Regardless, she and her colleagues hope their research adds to the basic knowledge and understanding of how speech is produced.

###

RESEARCH NEWS AND PUBLICATIONS OFFICE
430 Tenth St. N.W., Suite N-112
Georgia Institute of Technology
Atlanta, Georgia 30318

MEDIA RELATIONS CONTACTS:
John Toon (404-894-6986);
Internet: john.toon@edi.gatech.edu;
FAX: (404-894-6983)

TECHNICAL CONTACTS:
Kathleen E. Cummings (404-894-3335); Internet: kate@ee.gatech.edu

WRITER: Amanda Crowell

###

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.