Learn More
The performance of categorical music emotion classification that divides emotion into classes and uses audio features alone for emotion classification has reached a limit due to the presence of a semantic gap between the object feature level and the human cognitive level of emotion perception. Motivated by the fact that lyrics carry rich semantic(More)
In this paper, we propose a hybrid method for singing pitch extraction from polyphonic audio music. We have observed several kinds of pitch errors made by a previously proposed algorithm based on trend estimation. We also noticed that other pitch tracking methods tend to have other types of pitch error. Then it becomes intuitive to combine the results of(More)
Genre and emotion have been applied to content-based music retrieval and organization; however, the intrinsic correlation between them has not been explored. In this paper we present a statistical association analysis to examine such intrinsic correlation and propose a two-layer scheme that exploits the correlation for emotion classification. Significant(More)
As one of the most important mid-level features of music, chord contains rich information of harmonic structure that is useful for music information retrieval. In this paper, we present a chord recognition system based on the N-gram model. The system is time-efficient, and its accuracy is comparable to existing systems. We further propose a new method to(More)
In this paper, a structural maximum a posterior speaker adaptation method to adjust the existing speaking rate (SR) dependent hierarchical prosodic model (SR-HPM) to a new speaker's data for realizing a new voice of any given SR is discussed. The adaptive SR-HPM is formulated based on MAP estimation with a reference SR-HPM serving as an informative prior.(More)
In this paper, a structural maximum a posteriori SMAP speaker adaptation approach to adjusting the speaking rate SR-dependent hierarchical prosodic model SR-HPM of an existing SR-controlled Mandarin text-to-speech system to a new speaker's data for producing a new voice is discussed. Two main issues are addressed. One is the small SR coverage of the(More)
  • 1