文章基本信息

标题：Deep Learning Bidirectional LSTM based Detection of Prolongation and Repetition in Stuttered Speech using Weighted MFCC
其他标题：Deep Learning Bidirectional LSTM based Detection of Prolongation and Repetition in Stuttered Speech using Weighted MFCC
本地全文：下载
作者：Sakshi Gupta ; Ravi S. Shukla ; Rajesh K. Shukla 等
期刊名称：International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN：2158-107X
电子版ISSN：2156-5570
出版年度：2020
卷号：11
期号：9
DOI：10.14569/IJACSA.2020.0110941
出版社：Science and Information Society (SAI)
摘要：Stuttering is a neuro-development disorder during which normal speech flow is not fluent. Traditionally Speech-Language Pathologists used to assess the extent of stuttering by counting the speech disfluencies manually. Such sorts of stuttering assessments are arbitrary, incoherent, lengthy, and error-prone. The present study focused on objective assessment to speech disfluencies such as prolongation and syllable, word, and phrase repetition. The proposed method is based on the Weighted Mel Frequency Cepstral Coefficient feature extraction algorithm and deep-learning Bidirectional Long-Short term Memory neural network for classification of stuttered events. The work has utilized the UCLASS stuttering dataset for analysis. The speech samples of the database are initially pre-processed, manually segmented, and labeled as a type of disfluency. The labeled speech samples are parameterized to Weighted MFCC feature vectors. Then extracted features are inputted to the Bidirectional-LSTM network for training and testing of the model. The effect of different hyper-parameters on classification results is examined. The test results show that the proposed method reaches the best accuracy of 96.67%, as compared to the LSTM model. The promising recognition accuracy of 97.33%, 98.67%, 97.5%, 97.19%, and 97.67% was achieved for the detection of fluent, prolongation, syllable, word, and phrase repetition, respectively.
关键词：Speech; stuttering; deep learning; WMFCC; Bi-LSTM