Illustration of AI training (IMAGE) University of Groningen Caption The picture illustrates the proposed multi-modal approach for video scene recognition: Given a video, visual and audio descriptors are extracted, processed and fused for the classification of the depicted scene into one of the different nine indoor environments. Credit Estefanía Talavera Martínez Usage Restrictions Credit must be given to the creator. License CC BY Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.