文章基本信息

标题：SINGLE CHANNEL SPEECH ENHANCEMENT USING IDEAL BINARY MASK TECHNIQUE BASED ON COMPUTATIONAL AUDITORY SCENE ANALYSIS
本地全文：下载
作者：ABRAR HUSSAIN ; KALAIVANI CHELLAPPAN ; SITI ZAMRATOL M 等
期刊名称：Journal of Theoretical and Applied Information Technology
印刷版ISSN：1992-8645
电子版ISSN：1817-3195
出版年度：2016
卷号：91
期号：1
出版社：Journal of Theoretical and Applied
摘要：Single channel speech enhancement is necessary where the multichannel speech enhancement is not feasible due to space constraints in the intended device and cost-effectiveness. However, the problem of having limited information from single channel sound signal mixtures or unavailability of the speech source signals makes it more difficult to separate the target speech from the background maskers in the acoustic environment of low signal to noise ratio, in various background noises and in less temporal duration of speech signals. To address these problems, computational auditory analysis became popular from the last decade as a new concept for speech enhancement. In this paper, ideal binary mask which is inspired by the computational auditory analysis is used to analyze and synthesize the input speech signals and masker signals in the time-frequency domain, where all the signals usually overlap. Synthesized signals are evaluated for speech quality measurement in terms of segmental signal-to-noise ratio. This study uses Malay language based speech as input speech signals. These input speech signals vary in duration due to their word structure. Large crowd babble speech and two talker competing speech are employed as masker signals. The input signal-to-noise ratio is varied from -5 dB to +15 dB in steps of 5 dB to vary the difficulty level of acoustic environment. Results show that ideal binary mask algorithm reconstructs the target speech signals efficiently from the degraded and noisy speech signals. This is signified by the high segmental signal-to-noise ratio even in the lowest input signal-to-noise ratio. This type of high noise reduction is necessary to lessen the burden of elderly listener�s listening effort in noisy environment.
关键词：Speech Enhancement; Ideal Binary Mask; Time-Frequency Masking; Computational Auditory Scene Analysis; Speech Quality