摘要:Speech emotion recognition, using theconvolutional neural networks (CNN) model, is challenging dueto the problem of features loss and the decrease of recognitionaccuracy. To address this issue, a Multi-level residual CNNmodel is proposed is this paper. In this model, the speechsignals are converted into spectrogram, then the multi-levelresidual identity maps are introduced to compensate themissing features in the CNN during the convolution process, soas to improve the recognition accuracy of speech emotion. Theresearch results show that the Multi-level residual CNN canachieve 74.36% recognition accuracy on the EMO-DB dataset,which has better performance than traditional deep CNNmethod.