"We're using technological tools to get better data on how the vocal tract moves during speech," said study author Dani Byrd, an associate professor of linguistics and director of the USC Phonetics Laboratory in the USC College of Letters, Arts and Sciences.
"Magnetic resonance imaging allows us to look at movies of the entire vocal tract in action, something no one's been able to see in real time before now," Byrd said.
The team reported successful development and use of real-time MRI to create high-resolution movies of the vocal system in April's Journal of the Acoustical Society of America.
By helping to clarify ways that humans produce normal speech, the new technique may help people learn a foreign language, teach machines to speak more naturally and possibly suggest therapy for those with speech problems due to stroke.
The advance comes as a result of an interdisciplinary collaboration led by Byrd and electrical engineer Shrikanth Narayanan, an associate professor in the USC Viterbi School of Engineering who focuses his research at the interface of speech, engineering and computer science.
The team also drew on the talents of MRI systems researcher Krishna Nayak, an assistant professor of electrical engineering and medicine; Sungbok Lee, a research scientist in linguistics and electrical engineering, and Abhinav Sethy of electrical engineering.
MRI has been used for in speech research for more than a decade, said Byrd, who focuses her research on the production, perception and physical properties of speech sounds.
Up to now, MRI primarily has recorded still images of the dynamic vocal tract, data that have been useful but limited in telling researchers about the timing of speech.
But, as anyone who has ever tried a tongue-twisting phrase like "Peter Piper picked a peck of pickled peppers" knows, speaking is a moving art – an elegant and complex orchestration of vocal parts to produce sounds, words and sentences a listener can understand.
Narayanan and Nayak led the development of new analytical software that takes raw data from the MRI and reconstructs it into a moving image at 20 to 24 frames per second – just fast enough to capture the rapid changes in lips, tongue, jaw and the airway that together produce specific vowels, consonants and intonations of speech.
Real-time MRI allows Byrd to see and confirm the degree of sound overlap in spoken language, a characteristic of human speech she helped reveal in earlier work.
"There are no spaces between words in speech," Byrd said. "People overlap sounds within syllables. With MRI, you can actually see two sounds being made at the same time."
The researchers speculate that visual cues produced by MRI movies could help foreign language students learn to speak unfamiliar sounds, such as the "th" sound in English. "These images offer a view of how to pronounce sounds," said Byrd, referring to her own experience with a difficult-to-pronounce sound (it most closely resembles "r" in English) in the Tamil language of Southern India, which Narayanan speaks fluently.
The team recorded Narayanan saying the sound in the MRI. Studying the images, Byrd saw how to say the sound correctly – "the tongue tip is high in the palate making a cupped shape" – and finally got it right.
Other potential applications include help for people affected by congenital malformations that may make motor control, and thus the articulation of specific sounds, difficult.
Journal
The Journal of the Acoustical Society of America