Learn More
The aim of this work is to enable a noise-free time-domain speech signal to be reconstructed from a stream of MFCC vectors and fundamental frequency and voicing estimates, such as may be received in a distributed speech recognition system. To facilitate reconstruction, both a sinusoidal model and a source-filter model of speech are compared by listening(More)
This work proposes a method for predicting the fundamental frequency and voicing of a frame of speech from its mel-frequency cepstral coefficient (MFCC) vector representation. This information is subsequently used to enable a speech signal to be reconstructed solely from a stream of MFCC vectors and has particular application in distributed speech(More)
The paper considers the problem of audiovisual speech recognition in a simultaneous (target/masker) speaker environment. The paper follows a conventional mul-tistream approach and examines the specific problem of estimating reliable time-varying audio and visual stream weights. The task is challenging because, in the two speaker condition, signal-to-noise(More)
This work presents a method of reconstructing a speech signal from a stream of MFCC vectors using a source-filter model of speech production. The MFCC vectors are used to provide an estimate of the vocal tract filter. This is achieved by inverting the MFCC vector back to a smoothed estimate of the magnitude spectrum. The Wiener-Khintchine theorem and linear(More)
In this paper, we propose and study how to provide multipath cross-layer service discovery (MCSD) for mobile ad hoc networks (MANETs). Cross-layer service discovery integrates service discovery into route discovery by taking advantage of network-layer topology information and routing message exchange. Multipath service discovery differs from multipath(More)
IP Multimedia Subsystem (IMS) and Web services (WS) are service-oriented architectures developed separately for service delivery in the next generation telecommunications, and IT-centric computing environment, respectively. In order to harness services in both of these platforms and to facilitate combining and blending of services, we propose an integrated(More)
This paper presents a robust speech recognition technique called audiovisual speech fragment decoding (AV-SFD), in which the visual signal is exploited both as a cue for source separation and as a carrier of phonetic information. The model builds on the existing audio-only SFD technique which, based on the auditory scene analysis account of perceptual(More)
The paper proposes a technique for reconstructing an acoustic speech signal solely from a stream of Mel-frequency cepstral coefficients (MFCCs). Previous speech reconstruction methods have required an additional pitch element, but this work proposes two maximum a posteriori (MAP) methods for predicting pitch from the MFCC vectors themselves. The first(More)
This paper extends the technique of speech reconstruction from MFCCs by considering the effect of noisy speech. To reconstruct a clean speech signal from noise contaminated MFCCs an estimate of the clean mel-filterbank vector is required together with a robust estimate of the pitch. This work applies spectral subtraction to the mel-filterbank vector(More)