首页    期刊浏览 2024年12月14日 星期六
登录注册

文章基本信息

  • 标题:Speaker Recognition Based on 3DCNN-LSTM
  • 本地全文:下载
  • 作者:ZhangFang Hu ; XingTong Si ; Yuan Luo
  • 期刊名称:Engineering Letters
  • 印刷版ISSN:1816-093X
  • 电子版ISSN:1816-0948
  • 出版年度:2021
  • 卷号:29
  • 期号:2
  • 页码:463-470
  • 语种:English
  • 出版社:Newswood Ltd
  • 摘要:The traditional speaker recognition method reduces the feature signal from high to low dimensions, but this often leads to some speaker information loss, resulting in a low speaker recognition rate. In response to this problem, this paper proposes a model based on the combination of a 3D convolutional neural network (3DCNN) and a long short-term memory neural network (LSTM). First, the model uses a fixed-step speech feature vector as the 3DCNN input, which converts the text-independent speaker recognition mode into a "semi-text"-related speaker recognition mode, which greatly preserves the speaker's speech features, and thus improving the difference between the characteristics of different speakers. Second, the 3D convolution kernel designed in this paper can extract the personality characteristics of speakers in different dimensions to further distinguish different speakers, connect the output signal to the LSTM network through a time series to enhance the contextual connection of the speaker's voice, and finally mark the classification output result to realize a complete speaker recognition system. The experimental results show that the model structure improves the speaker recognition rate on AISHELL-1 dataset in short-term speech compared with traditional algorithms and popular embedding features, and the system is more robust over time.
  • 关键词:speaker recognition; semi-text processing; 3DCNN; LSTM
国家哲学社会科学文献中心版权所有