Illustration of AI training (IMAGE)
Caption
The picture illustrates the proposed multi-modal approach for video scene recognition: Given a video, visual and audio descriptors are extracted, processed and fused for the classification of the depicted scene into one of the different nine indoor environments.
Credit
Estefanía Talavera Martínez
Usage Restrictions
Credit must be given to the creator.
License
CC BY