Ishikawa, Japan - In the future, smartphones, which almost everyone has, and smart speakers, 3.7 million installed in Japanese households, might save your life. Apart from daily-use features, these devices can read emergency messages aloud to inform us of the current situation of an earthquake and how to evacuate. However, we might lose such crucial information due to difficulty listening in some circumstances. The intelligibility of speech is dramatically degraded by noise such as conversations and vacuum cleaners and reverberation as in poor auditoriums or subways.
On the other hand, on a typical day, have you ever been curious about why you enjoy watching movies in theaters more than in your living room? A bigger screen and a better sound system? Yes, of course, but there is one more factor that is “the well-designed room acoustics.”
In the field of architectural acoustics, the speech intelligibility and sound quality of a sound field can be described by measuring speech transmission index (STI) and room acoustic parameters, such as reverberation time (T60), early decay time (EDT), and clarity index (C80). It is also known that the measured acoustical parameters and STI vary from changes in environments such as the number of people, new furniture, or new decorations. Hence, some techniques for estimating these room-acoustic parameters from only a speech have been studied without special instruments and settings. However, assessing various parameters for different purposes of sound spaces in almost real-time remains to be uninvestigated.
In a new study published in Applied Acoustics, a team of scientists from the Japan Advanced Institute of Science and Technology (JAIST) has invented a blind estimating method of five-room acoustic parameters and STI simultaneously from a few seconds of speech. Professor Masashi Unoki, a team leader, outlines their approach, “We assumed a speech transmitted in an enclosure is distorted by reverberation and noise associated with the concept modulation transfer function or MTF. The MTF can explain the characteristics of a transmission channel or room acoustics from the modulation ratio between the input and output signal. Based on this assumption, we focused on extracting this relationship from only the output signal into the previously proposed room-impulse-response (RIR) model, namely, the extended RIR model.”
In simulations, the observed signals were synthesized by the convolution of speech signal uttered by five males and five females and 43 realistic RIRs measured from different spaces and configurations. Then, the proposed method estimates room acoustic parameters, including T60, EDT, C80, D50, Ts, and STI, from a short period (five seconds) of these reverberant speech signals. The team found that: (1) the envelope of a reverberant speech signal provides underlying information of room-acoustic characteristics, (2) reverberation and noise affect speech signals in octave bands differently, and (3) a more reasonable stochastic RIR model can accurately approximate a realistic RIR. Therefore, applying the convolutional neural networks for mapping the envelopes extracted from an observed speech signal can approximate an unknown RIR. Then, this approach can estimate the STI and various room-acoustic parameters from the approximated RIR.
Based on this finding, architects and acousticians might be able to monitor and diagnose an auditorium during the live performance of concerts with attendees. In the future, our smartphones or smart speakers in a kitchen might save our life one day—our lives tend to be safer, easier, and happier from this technology.
###
Reference
Blind estimation of speech transmission index and room acoustic parameters based on the extended model of room impulse response |
|
Journal |
Applied Acoustics |
DOI: |
About Japan Advanced Institute of Science and Technology, Japan
Founded in 1990 in Ishikawa prefecture, the Japan Advanced Institute of Science and Technology (JAIST) was the first independent national graduate school in Japan. Now, after 30 years of steady progress, JAIST has become one of Japan’s top-ranking universities. JAIST counts with multiple satellite campuses and strives to foster capable leaders with a state-of-the-art education system where diversity is key; about 40% of its alumni are international students. The university has a unique style of graduate education based on a carefully designed coursework-oriented curriculum to ensure that its students have a solid foundation on which to carry out cutting-edge research. JAIST also works closely both with local and overseas communities by promoting industry–academia collaborative research.
About Professor Masashi Unoki from Japan Advanced Institute of Science and Technology, Japan
Dr. Masashi Unoki is a Professor at the School of Information Science at the Japan Advanced Institute of Science and Technology (JAIST) where he received his M.S. and Ph.D. degrees in 1996 and 1999, respectively. His main research interests are in auditory motivated signal processing and the modeling of auditory systems. Dr. Unoki received the Sato Prize from the Acoustical Society of Japan (ASJ) in 1999, 2010, and 2013 for an Outstanding Paper and the Yamashita Taro “Young Researcher” Prize from the Yamashita Taro Research Foundation in 2005. He has published 198 papers and has authored 14 books so far.
Funding information
This work was supported by Shibuya Science Culture and Sports Foundation; JSPS-NSFC Bilateral Joint Research Projects/Seminars [JSJSBP120197416]; SCOPE Program of Ministry of Internal Affairs and Communications [Grant No.: 201605002]; Thammasat University Research Fund, Contract No. TUGR 2/35/2562; and SIIT-JAIST-NSTDA Dual Doctoral Degree Program.
Journal
Applied Acoustics
Article Title
Blind estimation of speech transmission index and room acoustic parameters based on the extended model of room impulse response
Article Publication Date
3-Sep-2021