摘要:Abstract Pitch or fundamental frequency is an important feature of bird song, from which scientists can learn much about a population. To use pitch as a feature, researchers need confidence in their pitch extraction system. Pitch detection algorithms (PDAs) proven to work on human speech may not be suitable for all types of bird vocalizations. This paper discusses pitch estimation performance on a variety of common bird vocalizations. The presence of multiple partials or tones simultaneously, extended frequency sweeps through multiple octaves, and rapid pitch modulations are just some of the difficulties encountered when estimating the pitch of bird song. Carefully tuned parameters improve pitch tracking with YIN, but optimal parameters can change quickly even within one song. YIN is a PDA which estimates pitch of human speech very well. This paper presents YIN-bird, a modified version of YIN which exploits spectrogram properties to automatically set a minimum fundamental frequency parameter for YIN. Gross pitch errors on whistles and trills were reduced by up to 4% on a ground truth data-set of synthetic bird song with known pitch. This data-set was evaluated by expert listeners and described as “sounding like original & can hardly tell it is synthetic”. A qualitative analysis showing YIN-bird not to be suitable for more complex bird vocalizations, such as nasals, is also presented.
关键词:pitch tracking ; bird vocalizations ; bird song