英文摘要 |
Speech and music discrimination is one of the most important issues for multimedia information retrieval and efficient coding. While many features have been proposed, seldom of which show robustness under noisy condition, especially in telecommunication applications. In this paper two novel features based on real cepstrum are presented to represent essential differences between music and speech: Average Pitch Density (APD), Relative Tonal Power Density (RTPD). Separate histograms are used to prove the robustness of the novel features. Results of discrimination experiments show that these features are more robust than the commonly used features. The evaluation database consists of a reference collection and a set of telephone speech and music recorded in real world. |