Learn More
The task of keyword spotting is to detect a set of keywords in the input continuous speech. The main goal of this work is to develop an improved Mandarin keyword spotting (KWS) system for conversational telephone speech (CTS). In this paper, we propose an efficient online-garbage model based KWS system, which integrated with a word-level minimum(More)
In this paper, we present three methods for improving the searching speed and accuracy for query by humming (QBH) system with large melody database. 1) At the feature level, to minimize the inevitable errors caused by a single pitch extractor, three different pitch extraction algorithms are fused together to gain more credible and robust pitch sequence. 2)(More)
Since it is the most natural way for people to search a specific melody in large music database, query by humming/singing is attracting more and more researcherspsila attention in the field of content-based music information retrieval. In this task, note-based and frame-based similarity measures are two commonly used methods. However, in previous works,(More)
Lightly supervised acoustic model training has been recognized as an effective way to improve acoustic model training for broadcast news recognition. In this paper, a new approach is introduced to both fully utilize the un-transcribed data by using closed captions as transcripts and to select more informative data for acoustic model training. We will show(More)
In this paper, we propose a new approach to utilize temporal information and neural network (NN) to improve the performance of automatic mispronunciation detection (AMD). Firstly, the alignment results between speech signals and corresponding phoneme sequences are obtained within the classic GMM-HMM framework. Then, the long-time TempoRAl Patterns (TRAPs)(More)
Automatic evaluation of GOR (Goodness Of pRosody) is a more advanced and challenging task in CALL (Computer Aided Language Learning) system. Apart from traditional prosodic features, we develop a method based on multiple knowledge sources without any prior condition of reading text. After speech recognition, apart from most state-of-the-art features in(More)
Mispronunciation detection is an important component in computer assisted language learning (CALL) system. In this work, we introduce an efficient GLDS-SVM based detection method, which is successfully used in language and speaker identification systems, and combine it with traditional methods. The main ideas include: extended MFCC features with normalized(More)
Keyword spotting (KWS) is an essential technique for speech information retrieval. When doing offline keyword query on large volume spontaneous speech data, fast and accurate KWS methods are required. In this paper, a novel phone-state matrix based vocabulary-independent KWS method is proposed, which has merits of both hidden Markov model (HMM) based and(More)
In automatic speech grading systems, rare research is followed through addressing the issue of GOR (Goodness Of pRosody). In this paper we propose a novel method by taking the advantage of our QBH (Query By Humming) techniques in 2008 MIREX evaluation task. A set of standard samples related to the top-cream students are initially picked up as templates, a(More)
In the task of mispronunciation detection, the cross-speaker degradation and some other confusing nuisances are the challenging problems demanding prompt solution. In this paper, we will attempt to remove the non-pronunciation variations in the GLDS-SVM expansion space by using nuisance attribute projection strategy, in order to increase the separating(More)