文章基本信息

标题：3DACRNN Model Based on Residual Network for Speech Emotion Classification
本地全文：下载
作者：Zhangfang Hu ; Shanshan Tang ; Yuan Luo 等
期刊名称：Engineering Letters
印刷版ISSN：1816-093X
电子版ISSN：1816-0948
出版年度：2021
卷号：29
期号：2
页码：400-407
语种：English
出版社：Newswood Ltd
摘要：Speech emotion recognition(SER) is extremelychallenging due to the problem of disappearing or explodinggradients and weak spatiotemporal correlations. To addressthis issue, a new approach is proposed the 3D attentionalconvolutional recurrent neural networks based on residualnetworks (Res3DACRNN) model to learn emotion deepfeatures. The Res3DCNN model extracts deep-level multiscalespectral-temporal features of emotional speech fromspectrograms. The introduction of a residual network allowscompensation for the missing features of traditional CNNs inthe convolution process to prevent the problem of gradientdisappearance or explosion. An attention-based recurrentneural network (ARNN) then extracts the long-termdependencies of these features, improving the weakspatiotemporal correlation of the problem. To reduce thecomputational complexity, this paper improves the forget gateof LSTM and proposes a novel post-forgetting gate structure.Finally, a softmax layer is utilized for emotion classification.The experimental results of the proposed model on theEMO-DB and IEMOCAP emotional corpus show that theperformance is significantly improved compared with thecurrent mainstream deep learning methods.
关键词：Convolutional neural network; recurrent neural network; residual network; post-forget gate