image: The good linearity depicted in this plot indicates excellent agreement between manual scoring and fully automated scoring, showcasing the potential of the proposed system.
Credit: Michael McGuire from Doshisha University, Japan
In today’s increasingly interconnected world, language learning has become essential for education, business, and cultural exchange. However, accurately measuring proficiency in language learners is a complex matter. One particularly valuable approach involves asking learners to listen to sentences and then repeat them back as accurately as possible. Known as elicited imitation (EI), this method reveals much more than mere memory and mimicking abilities. When sentences exceed our working memory capacity—typically beyond 8 to 10 syllables—successful repetition requires learners to quickly process and reconstruct the language using their internalized knowledge of its patterns and structures. Thus, EI can provide a window into one’s true language proficiency.
While EI has been refined over decades, its widespread adoption has been limited by a significant practical challenge: the need for trained human evaluators to listen to and score each response. This time-consuming process makes large-scale assessments difficult, despite EI’s studied effectiveness. Even though EI-based tests gained significant traction in the early 2000s thanks to the development of a standardized scoring approach, EI’s resource demands have continued to restrict its use in educational settings where it could be beneficial.
Against this backdrop, a research team comprising Associate Professor Michael McGuire from the Department of English, Faculty of Letters, Doshisha University, Japan, and Dr. Jenifer Larson-Hall from The University of Kitakyushu has pioneered an automated approach for EI assessment. Their latest paper, made available online on March 11, 2025, and published in Volume 4, Issue 1 of Research Methods in Applied Linguistics on April 1, 2025, demonstrates how artificial intelligence can help revolutionize this process.
The researchers proposed a two-part solution to automate EI testing: first, using OpenAI’s Whisper, an automatic speech recognition (ASR) system, to transcribe learners’ spoken responses into text; and second, applying a computational metric called ‘Word Error Rate (WER)’ to evaluate how closely these responses matched the original prompts. Whisper ASR was selected because of its high accuracy with non-native speech and tolerance for background noise, while WER offered a precise way to measure deviations from target sentences, identifying substitutions, insertions, and deletions at the word level.
The team theorized that, together, these technologies could potentially replace human raters while maintaining assessment quality. To test this approach, they recruited 30 Japanese university students with varying levels of English proficiency, who completed a 30-item EI test. The resulting 900 speech samples were transcribed both by human raters and by the Whisper ASR system. Remarkably, they found that Whisper’s transcriptions matched those of the human raters, demonstrating exceptional accuracy. This strong alignment persisted even though the testing conditions were not ideal and included some instances of background noise, highlighting the system’s robustness in real-world educational environments.
Most importantly, the automated scoring aligned extremely well with traditional human evaluation methods, showing a near-perfect correlation. “Our study demonstrates the feasibility and reliability of fully automated speaking tests for English language learners, which can be conducted at scale for low cost, making the assessment of speaking proficiency much more accessible and reliable for widespread use,” explains Mr. McGuire.
The implications extend beyond merely saving teachers’ time and resources. Automated assessment could enable more frequent evaluation of students’ speaking skills, providing the timely feedback crucial for language development. It could also facilitate larger-scale studies that have previously been impractical due to the demands of manual scoring. The research team is already working towards making these goals a reality, as Mr. McGuire states: “We are currently developing a web-based EI testing platform, which we plan to fully automate using Whisper ASR and WER so that tests can be taken on a smartphone and scoring can be done online in real time. We hope to make this fully automated platform available to other researchers and educators in the near future.”
Looking ahead, the researchers also plan to expand their approach to develop multiple standardized test forms with equivalent difficulty levels, enabling more reliable tracking of language development over time. They also envision creating curriculum-specific tests focused on particular vocabulary and grammar features, and potentially even adaptive tests that could pinpoint oral proficiency more efficiently than current methods.
As artificial intelligence continues to advance, this research represents a significant step toward making reliable language assessment more accessible and scalable.
About Associate Professor Michael McGuire from Doshisha University, Japan
Mr. Michael McGuire is an Associate Professor at the Faculty of Letters, Department of English of Doshisha University. He is currently pursuing a PhD in Second Language Acquisition at The University of Kitakyushu. His research interests are centered around foreign language education, data-driven learning, listening perception, corpus tools, spoken fluency, and formulaic language. He has published nine major research papers on these topics. He is currently a member of the American Association for Applied Linguistics, Japan English Language Education Society, Kansai English Language Education Society, Japan Association of College English Teachers, and Japan Association for Language Teaching (JALT), among others. He is the Publicity Chair of the JALT Vocabulary SIG.
Funding information
Funding for this research was provided by Doshisha University, Kyoto, Japan.
Media contact:
Organization for Research Initiatives & Development
Doshisha University
Kyotanabe, Kyoto 610-0394, JAPAN
E-mail:jt-ura@mail.doshisha.ac.jp
Journal
Research Methods in Applied Linguistics
Method of Research
Experimental study
Subject of Research
People
Article Title
Assessing Whisper automatic speech recognition and WER scoring for elicited imitation: Steps toward automation
Article Publication Date
1-Apr-2025
COI Statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.