文章基本信息

标题：Speech Emotion Recognition Model Based on Attention CNN Bi-GRU Fusing Visual Information
本地全文：下载
作者：Zhangfang Hu ; Lan Wang ; Yuan Luo 等
期刊名称：Engineering Letters
印刷版ISSN：1816-093X
电子版ISSN：1816-0948
出版年度：2022
卷号：30
期号：2
页码：427-434
语种：English
出版社：Newswood Ltd
摘要：The problem of low recognition accuracy of emotion recognition models is easily caused by interference such as data redundancy and irrelevant features. In this paper, we propose a speech emotion recognition (SER) method based on an attentional convolutional neural network (CNN) bidirectional gated recurrent unit (Bi-GRU) fusing visual information. First, we pretrained the log-mel spectrograms in a ResNet-based attentional convolutional neural network (RACNN) to extract speech features. Second, the CNN-extracted facial static appearance features are fused with speech features using a deep Bi-GRU to obtain speech appearance features. A series of gated recurrent units with attention mechanisms (AGRUs) are used to extract facial geometric features. Then, the hybrid features are obtained by further combining the integrated speech appearance features with facial geometric features, and kernel linear discriminant analysis (KLDA) is used to discriminate them. Finally, the proposed method in this paper obtained accuracies of 87.92% and 89.65% on the RAVDESS and eNTERFACE'05 emotion databases, respectively. The experimental results demonstrate that the method in this paper effectively improved the accuracy and robustness of SER.
关键词：SER;visual information;Bi-GRU;AGRUs;KLDA