期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2019
卷号:10
期号:4
页码:371-378
DOI:10.14569/IJACSA.2019.0100445
出版社:Science and Information Society (SAI)
摘要:With the wide spread usage of smartphones and social media platforms, video logging is gaining an increasing popularity, especially after the advent of YouTube in 2005 with hundred millions of views per day. It has attracted interest of many people with immense emerging applications, e.g. filmmak-ers, journalists, product advertisers, entrepreneurs, educators and many others. Nowadays, people express and share their opinions online on various daily issues using different forms of content including texts, audios, images and videos. This study presents a multimodal approach for recognizing the speaker’s age group from social media videos. Several structures of Artificial Neural Networks (ANNs) are presented and evaluated using standalone modalities. Moreover, a two-stage ensemble network is proposed to combine multiple modalities. In addition, a corpus of videos has been collected and prepared for multimodal age-group recog-nition with focus on Arabic language speakers. The experimental results demonstrated that combining different modalities can mitigate the limitations of unimodal recognition systems and lead to significant improvements in the results.
关键词:Multimodal recognition; opinion mining; age groups; word embedding; acoustic features; visual features; in-formation fusion; ensemble learning; Arabic speakers