Microsoft's Azure AI Speech platform achieved “significant improvements” in recognizing non-standard English speech thanks to recordings and transcripts from University of Illinois Urbana-Champaign Speech Accessibility Project participants. Its accuracy gains range from 18% to 60%, depending on the speaker’s disability.
The changes are currently rolling out on Microsoft's Cloud endpoint for third-party customers.
Until now, the majority of voice recognition technology trained using recordings and transcriptions from audiobooks. But an audiobook narrator and an individual with aphasia after a stroke sound different.
When the Speech Accessibility Project began, the largest database of atypical speech included 16 people with cerebral palsy. Mark Hasegawa-Johnson, a professor of electrical engineering and the Speech Accessibility Project’s leader, also created it at the U. of I. The Speech Accessibility Project currently includes about 1,500 participants and Microsoft is a member of the coalition that funds the work.
“Accessibility is a core value for Microsoft,” said Aadhrik Kuila, a product manager at Microsoft working to integrate the Speech Accessibility Project data into Azure’s Speech service. “These improvements are a testament to our commitment to building technologies that empower everyone, including people with non-standard speech. This collaboration not only enhances accessibility but also sets a benchmark for how industry and academia can work together to drive meaningful societal impact.”
The Speech Accessibility Project records people with diverse speech patterns to improve voice recognition technology. The project is currently recruiting U.S., Canadian and Puerto Rican adults with amyotrophic lateral sclerosis, cerebral palsy, Down syndrome, Parkinson’s disease and those who have had a stroke.
So how do the project’s recordings help improve speech recognition tools?
Think of the engineers training an artificial intelligence model as math teachers that have a pool of math problems (in this case, a training set made up of voice recordings). The engineers teach the computer how to solve the math problems by providing the answers (exact transcriptions of what the recordings say). They also set aside several math problems for a test at the end of the unit. The test set’s problems are similar, but new, so the engineers can see what the model learned.
“We also compare these results with our current production model to quantify the gains,” Kuila said. “Importantly, we run our standard test sets focused on typical speech to ensure that incorporating (project) data doesn’t cause regressions.”
Microsoft is committed to enhancing the accessibility of AI systems by integrating disability-representative data into the development process.
“This iterative process allows us to fine-tune training parameters to strike the best balance,” Kuila said, “improving performance for non-standard speech while maintaining or slightly enhancing accuracy for typical speech.”
Hasegawa-Johnson said he’s thrilled that Microsoft is already seeing improvements.
“It’s the first result we’ve heard of a company running against production data and seeing significant improvements,” he said. “It’s exciting to see that a year and a half into the project, we’re having an impact.”
Other coalition members include Amazon, Apple, Google and Meta. The project’s data is first shared with coalition members, and then is available to any companies, universities or researchers who agree to the project’s data use agreement.
Subject of Research
People