Determining the emotion of a song that best characterizes the affective content of the song is a challenging issue due to the difficulty of collecting reliable ground truth data and the semantic gap between human's perception and the music signal of the song. To address this issue, we represent an emotion as a point in the Cartesian space with valence and arousal as the dimensions and determine the coordinates of a song by the relative emotion of the song with respect to other songs. We also develop an RBF-ListNet algorithm to optimize the ranking-based objective function of our approach. The cognitive load of annotation, the accuracy of emotion recognition, and the subjective quality of the proposed approach are extensively evaluated. Experimental results show that this ranking-based approach simplifies emotion annotation and enhances the reliability of the ground truth. The performance of our algorithm for valence recognition reaches 0.326 in Gamma statistic.