首页    期刊浏览 2025年02月28日 星期五
登录注册

文章基本信息

  • 标题:Speaker-adaptive-trainable Boltzmann machine and its application to non-parallel voice conversion
  • 本地全文:下载
  • 作者:Toru Nakashika ; Yasuhiro Minami
  • 期刊名称:EURASIP Journal on Audio, Speech, and Music Processing
  • 印刷版ISSN:1687-4714
  • 电子版ISSN:1687-4722
  • 出版年度:2017
  • 卷号:2017
  • 期号:1
  • 页码:1-10
  • DOI:10.1186/s13636-017-0112-6
  • 出版社:Hindawi Publishing Corporation
  • 摘要:In this paper, we present a voice conversion (VC) method that does not use any parallel data while training the model. Voice conversion is a technique where only speaker-specific information in the source speech is converted while keeping the phonological information unchanged. Most of the existing VC methods rely on parallel data—pairs of speech data from the source and target speakers uttering the same sentences. However, the use of parallel data in training causes several problems: (1) the data used for the training is limited to the pre-defined sentences, (2) the trained model is only applied to the speaker pair used in the training, and (3) a mismatch in alignment may occur. Although it is generally preferable in VC to not use parallel data, a non-parallel approach is considered difficult to learn. In our approach, we realize the non-parallel training based on speaker-adaptive training (SAT). Speech signals are represented using a probabilistic model based on the Boltzmann machine that defines phonological information and speaker-related information explicitly. Speaker-independent (SI) and speaker-dependent (SD) parameters are simultaneously trained using SAT. In the conversion stage, a given speech signal is decomposed into phonological and speaker-related information, the speaker-related information is replaced with that of the desired speaker, and then voice-converted speech is obtained by combining the two. Our experimental results showed that our approach outperformed the conventional non-parallel approach regarding objective and subjective criteria.
  • 关键词:Voice conversion ; Boltzmann machine ; Unsupervised training ; Energy-based model ; Speaker adaptation ; Non-parallel training ; SAT ;
国家哲学社会科学文献中心版权所有