摘要:To settle out the problem that search of speaker change point (SCP) is blind and exhaustive, mean shift is proposed to seek SCP by estimating the kernel density of speech stream in this paper. It contains three steps: seeking peak points using mean shift firstly, using maximum likelihood ratio (MLR) to compute the MLR value of the peak points secondly, and seeking SCPs from MLR value using the maximum method thirdly. The relationship of MLR and BIC is given then. Compared with those methods of using metric or model, the process of seeking SCP is no longer blind because mean shift always points the direction of maximum increase in the density. The experiments show that the proposed algorithm can arrive a comparable result against to BIC and DISTBIC, while it can save detection time, for a 3-second speech segment , the time using the proposed algorithm is about 60% of DISTBIC and 45% of BIC . Further investigation and improvement about this method is discussed at the end of this paper.
关键词:Speaker change detection;mean shift;kernel density estimation;peak point;maximum likelihood ratio