Ming-Cheung Cheung

Learn More
In speaker verification, a claimant may produce two or more utterances. Typically, the scores of the speech patterns extracted from these utterances are averaged and the resulting mean score is compared with a decision threshold. Rather than simply computing the mean score, we propose to compute the optimal weights for fusing the scores based on the score(More)
To improve the reliability of telephone-based speaker verification systems, channel compensation is indispensable. However, it is also important to ensure that the channel compensation algorithms in these systems surpress channel variations and enhance interspeaker distinction. This paper addresses this problem by a blind feature-based transformation(More)
This paper proposes a two-level fusion strategy for audiovisual biometric authentication. Specifically, fusion is performed at two levels: intramodal and intermodal. In intramodal fusion, the scores of multiple samples (e.g. utterances or video shots) obtained from the same modality are linearly combined, where the combination weights depend on the(More)
In many biometric systems, the scores of multiple samples (e.g. utterances) are averaged and the average score is compared against a decision threshold for decision making. The average score, however, may not be optimal because the distribution of the scores is ignored. To address this limitation, we have recently proposed a fusion model that incorporates(More)
This paper proposes a single-source multi-sample fusion approach to text-independent speaker verification. In conventional speaker verification systems, the scores obtained from claimant's utterances are averaged and the resulting mean score is used for decision making. Instead of using an equal weight for all scores, this paper proposes assigning a(More)
In speaker verification, a claimant may produce two or more utterances. In our previous study [1], we proposed to compute the optimal weights for fusing the scores of these utterances based on their score distribution and our prior knowledge about the score statistics estimated from the mean scores of the corresponding client speaker and some(More)
The paper proposes a multiple-source, multiple-sample fusion approach to identity verification. Fusion is performed at two levels, intramodal and intermodal. In intramodal fusion, the scores of multiple samples (e.g., utterances or video shots) obtained from the same modality are linearly combined, where the combination weights are dependent on the(More)
This paper proposes a constrained stochastic feature transformation algorithm for robust speaker verification. The algorithm computes the feature transformation parameters based on the statistical difference between a test utterance and a composite GMM formed by combining the speaker and background models. The transformation is then used to transform the(More)
Acoustic mismatch between the training and recognition conditions presents one of the serious challenges faced by speaker recognition researchers today. The goal of channel compensation is to achieve performance approaching that of a "matched condition" system while avoiding the need for a large amount of training data. It is important to ensure that the(More)
Fusion techniques have been widely used in multi-modal biometric au-thentication systems. While these techniques are mainly applied to combine the outputs of modality-dependent classifiers, they can also be applied to fuse the decisions or scores from a single modality. The idea is to consider the multiple samples extracted from a single modality as(More)
  • 1