News Release

Computerized map responds to speech and gestures

Peer-Reviewed Publication

Penn State

University Park, Pa. --- Penn State researchers have developed a prototype system to help visitors locate campus parking lots and buildings by talking with a computer-controlled map that responds not only to the spoken word but also to natural hand gestures.

Project leader Dr. Rajeev Sharma, assistant professor of computer science and engineering, says, "There still is a lot of work to be done but we have a pretty fair, ground- level, demonstration model of a system in which a person can interact with a computer by using the most natural human mode of communication - talking while gesturing.

"Besides the current application, the system could potentially be adapted to help tourists locate the sights in large cities, shoppers to find stores in malls, visitors to find patients in hospitals or even for roles in crisis management, mission planning and briefing," he adds.

In a recent demonstration, Sanshzar Kettebekov, a doctoral student, stood about 5 feet away from a map of the Penn State campus projected on an 4-foot-by-3- foot screen. "Scroll," he said gently into the cordless microphone attached to his T-shirt and the map moved. "Stop," Kettebekov directed and the map did. He waved his hand in the air and a little red hand appeared on the screen. As Kettebekov continued to gesture with his hand, the on-screen hand followed it, like a cursor obeying a mouse. When the red hand settled on one of the buildings, Kettebekov said, "Show me the nearest parking lot," and a bright blue line immediately appeared and connected the building to the closest lot.

The system is based on off-the-shelf-equipment. The computer is a standard PC workstation equipped with a video camera, the system's "eye" on the gesturing human. A commercially available speech recognition package currently takes care of the conversation. However, the Penn State researchers developed new gesture recognition software and used footage of TV weather broadcasters narrating the weather to "train" it.

The new Penn State gesture recognition software is based on a technique called Hidden Markov Models (HMM), a time-varying pattern recognition method. HMMs had been used previously in gesture recognition systems. However, only predefined gestures, such as sign language, had been used. The new Penn State approach, based on weathercaster movements, enables the computer to recognize and "understand" a rich store of natural gestures that occur in combination with speech.

At this point, although the system recognizes quite a few human gestures and spoken words, it doesn't like small talk. You can't tell it, "Well, I'd like to go to the Creamery for an ice cream cone first and then stop off at Old Main before parking at Beaver Stadium." At least not yet.

Yuhui Zhou, a master's degree candidate, has a background in linguistics. She is working on dialog design and feedback systems that will enable the computer to extract the most salient information from a human conversation stream. Jiongyu Cai, doctoral candidate, is working on extracting the salient gestures from the random hand waving that most people use while talking. Kettebekov is trying to understand the combination of speech and gestures so that he can develop software that enables the computer to interpret gestures in the speech context.

The research team is also paying attention to the fact that people from different cultures gesture differently but, at present, plans call for the map to respond only to English.

Sharma says, "Computer users have been slaves to the mouse and the keyboard too long. The equipment has, so far, limited the potential for human interaction with computers. Incorporating gesture, which computer vision makes possible, allows us to imagine all kinds of potential applications. For example, I can imagine a computer you wear on your head, like a virtual reality helmet, that could help you repair your PC by telling you what to do and then "watching" as you do it. Or, a wearable computerized surgical aide that could help direct a surgeon to the precise location of a tumor.

"For now, our group will be working on trying to enable the computer to more effectively talk back to the user. We'd like to model the human/computer dialog so that the display could interactively influence the user input enabling the computer to play a more active role in the natural speech/gesture interface," he adds.

The research group has detailed the new system in a paper, Toward Interpretation of Natural Speech/Gesture: Spatial Planning on a Virtual Map published in the Proceedings of the Army Research Laboratory Annual Symposium on Advanced Display, held in February 1999 in Adelphi, Md. The work on gesture recognition is detailed in Indrajit Poddar 's master's thesis, completed in May, entitled "Continuous Recognition of Dieictic Gestures for Multimodal Interfaces."

The research was supported, in part, by grants from the National Science Foundation and the Army Research Laboratory.

###

Note: Dr. Sharma is at 814-863-0147 or e-mailrsharma@cse.psu.edu by email. His website is at http://www.cse.psu.edu/-rsharma


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.