Learn More
Linear discriminant analysis (LDA) can be viewed as a two-stage procedure geometrically. The first stage conducts an orthogonal and whitening transformation of the variables. The second stage involves a principal component analysis (PCA) on the transformed class means, which is intended to maximize the class separability along the principal axes. In this(More)
Due to the cold-start problem, measuring the similarity between two pieces of audio music based on their low-level acoustic features is critical to many Music Information Retrieval (MIR) systems. In this paper, we apply the bag-of-frames (BOF) approach to represent low-level acoustic features of a song and exploit music tags to help improve the performance(More)
Chinese spelling check (CSC) is still an open problem today. To the best of our knowledge, language modeling is widely used in CSC because of its simplicity and fair predictive power, but most systems only use the conventional n-gram models. Our work in this paper continues this general line of research by further exploring different ways to glean extra(More)
Since more and more multimedia data associated with spoken documents have been made available to the public, spoken document retrieval (SDR) has become an important research subject in the past two decades. The i-vector based framework has been proposed and introduced to language identification (LID) and speaker recognition (SR) tasks recently. The major(More)
Linear discriminant analysis (LDA) is designed to seek a linear transformation that projects a data set into a lower-dimensional feature space while retaining geometrical class separability. However, LDA cannot always guarantee better classification accuracy. One of the possible reasons lies in that its formulation is not directly associated with the(More)
Topic modeling has been widely applied in a variety of text modeling tasks as well as in speech recognition systems for effectively capturing the semantic and statistic information in documents or speech utterances. Most topic models rely on the bag-of-words assumption that results in learned latent topics composed of lists of individual words.(More)
In this paper, we study the use of two kinds of kernel-based discriminative models, namely support vector machine (SVM) and deep neural network (DNN), for speaker verification. We treat the verification task as a binary classification problem, in which a pair of two utterances, each represented by an i-vector, is assumed to belong to either the(More)
In this paper, we first reformulate the derivation of the conventional i-vector scheme, which is the state-of-the-art utterance representation for speaker verification, as a modeling of universal background model (UBM)-based mixtures of factor analyzers (UMFA), and then propose a clustering-based UMFA method called CMFA. In UMFA, each analyzer is(More)
Linear discriminant analysis (LDA) is designed to seek a linear transformation that projects a data set into a lower-dimensional feature space for maximum class geometrical separability. LDA cannot always guarantee better classification accuracy, since its formulation is not in light of the properties of the classifiers, such as the automatic speech(More)