Hung-Shin Lee

Learn More
Chinese spelling check (CSC) is still an open problem today. To the best of our knowledge, language modeling is widely used in CSC because of its simplicity and fair predictive power, but most systems only use the conventional n-gram models. Our work in this paper continues this general line of research by further exploring different ways to glean extra(More)
In this paper, we study the use of two kinds of kernel-based discriminative models, namely support vector machine (SVM) and deep neural network (DNN), for speaker verification. We treat the verification task as a binary classification problem, in which a pair of two utterances, each represented by an i-vector, is assumed to belong to either the(More)
Due to the cold-start problem, measuring the similarity between two pieces of audio music based on their low-level acoustic features is critical to many Music Information Retrieval (MIR) systems. In this paper, we apply the bag-offrames (BOF) approach to represent low-level acoustic features of a song and exploit music tags to help improve the performance(More)
Linear discriminant analysis (LDA) can be viewed as a twostage procedure geometrically. The first stage conducts an orthogonal and whitening transformation of the variables. The second stage involves a principal component analysis (PCA) on the transformed class means, which is intended to maximize the class separability along the principal axes. In this(More)
Since more and more multimedia data associated with spoken documents have been made available to the public, spoken document retrieval (SDR) has become an important research subject in the past two decades. The i-vector based framework has been proposed and introduced to language identification (LID) and speaker recognition (SR) tasks recently. The major(More)
Music tags provide different types of semantic information about music. Recently, automatic music tagging has generated a great deal of interest among researchers in the field of music information retrieval. In this paper, we propose a posterior weighted Bernoulli mixture model (PWBMM) that automatically annotates a song with tags, or retrieves relevant(More)
Linear discriminant analysis (LDA) is designed to seek a linear transformation that projects a data set into a lower-dimensional feature space while retaining geometrical class separability. However, LDA cannot always guarantee better classification accuracy. One of the possible reasons lies in that its formulation is not directly associated with the(More)
In this paper, we first reformulate the derivation of the conventional i-vector scheme, which is the state-of-the-art utterance representation for speaker verification, as a modeling of universal background model (UBM)-based mixtures of factor analyzers (UMFA), and then propose a clustering-based UMFA method called CMFA. In UMFA, each analyzer is(More)