Monaural Multi-Talker Speech Recognition using Factorial Speech Processing Models

@article{Khademian2018MonauralMS,
  title={Monaural Multi-Talker Speech Recognition using Factorial Speech Processing Models},
  author={Mahdi Khademian and Mohammad Mehdi Homayounpour},
  journal={Speech Commun.},
  year={2018},
  volume={98},
  pages={1-16}
}
A Pascal challenge entitled monaural multi-talker speech recognition was developed, targeting the problem of robust automatic speech recognition against speech like noises which significantly degrades the performance of automatic speech recognition systems. In this challenge, two competing speakers say a simple command simultaneously and the objective is to recognize speech of the target speaker. Surprisingly during the challenge, a team from IBM research, could achieve a performance better… Expand
Improvement in monaural speech separation using sparse non-negative tucker decomposition
TLDR
A monaural speech separation/enhancement technique based on non-negative tucker decomposition (NTD) has been introduced and the effect of sparsity regularization factor on the separation of mixed signal is included in the generalized cost function of NTD. Expand
Taylor-DBN: A new framework for speech recognition systems
TLDR
The Taylor-Deep Belief Network (Taylor-DBN) classifier is proposed, which is the modification of the Gradient Descent algorithm with Taylor series in the existing DBN classifier to provide better performance of speech recognition. Expand
Improved Source Counting and Separation for Monaural Mixture
TLDR
A novel model of single-channel multi-speaker separation by jointly learning the time-frequency feature and the unknown number of speakers is proposed, which achieves the state-of-the-art separation results on multi- Speaker mixtures in terms of scale-invariant signal- to-noise ratio improvement (SI-SNRi) and signal-to-distortion ratio improved (SDRi). Expand
ASR for mixed speech using SNMF based separation algorithm
TLDR
Results show that recognition of speech signals after separation using SNMF based algorithm perform better as compared to the recognition of mixed speech when the target to mixed signal ratio is lower than 20dB. Expand
A Novel English Speech Recognition Approach Based on Hidden Markov Model
  • Chao Xue
  • Computer Science
  • 2018 International Conference on Virtual Reality and Intelligent Systems (ICVRIS)
  • 2018
TLDR
A HMM-based semi-nonparametric method is proposed to enhance the performance of the accuracy of English speech recognition and can achieve higher recognition rate than the related works. Expand
Computer Intelligent Recognition Image Speech Signal Application
TLDR
This paper introduces the existing speech synthesis system implementation method and speech recognition technology, then analyzes the existing methods, and proposes an improved method of server-based intelligent speech control system, and the system is implemented. Expand
Feature joint-state posterior estimation in factorial speech processing models using deep neural networks
TLDR
An objective function to solve an underdetermined system of equations, which is used by the network for extracting joint-state posteriors is defined and the required expressions for fine-tuning the network in a unified way are developed. Expand
Reliability Modeling of Speech Recognition Tasks
TLDR
A reliability model is proposed to measure the performance of speech recognition and two types of task failures are suggested and an iterative approach is adopted. Expand
An Assessment of the Visual Features Extractions for the Audio-Visual Speech Recognition
  • M. I. Mohmand
  • Computer Science
  • International Journal of Advanced Trends in Computer Science and Engineering
  • 2019
TLDR
Examining phoneme acknowledgement execution utilising visual highlights removed from mouth area of-enthusiasm utilising discrete cosine transform and discrete wavelet transform will help in the choosing appropriate feature for various application as well as distinguish the restrictions of these techniques in the acknowledgment of the individual-phonemes. Expand
Design of Single Channel Speech Separation System Based on Deep Clustering Model
TLDR
ADeep clustering model based on bidirectional long short-term memory network (BLSTM) is proposed, which adds phase information to speech signal processing and uses deep clustering to differentiate two speakers. Expand
...
1
2
...

References

SHOWING 1-10 OF 31 REFERENCES
Monaural speech separation and recognition challenge
TLDR
The purpose of the monaural speech separation and recognition challenge was to permit a large-scale comparison of techniques for the competing talker problem and the finding that several systems achieved near-human performance in some conditions, and one out-performed listeners overall. Expand
Super-human multi-talker speech recognition: A graphical modeling approach
TLDR
A system that can separate and recognize the simultaneous speech of two people recorded in a single channel is presented and how belief-propagation reduces the complexity of temporal inference from exponential to linear in the number of sources and the size of the language model is shown. Expand
Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition
TLDR
This work investigates techniques based on deep neural networks for attacking the single-channel multi-talker speech recognition problem and demonstrates that the proposed DNN-based system has remarkable noise robustness to the interference of a competing speaker. Expand
Speech recognition using factorial hidden Markov models for separation in the feature space
TLDR
An algorithm for the recognition and separation of speech signals in non-stationary noise, such as another speaker, is proposed using hidden Markov models trained for the speech and noise into a factorial HMM to model the mixture signal. Expand
Robust continuous speech recognition using parallel model combination
TLDR
After training on clean speech data, the performance of the recognizer was found to be severely degraded when noise was added to the speech signal at between 10 and 18 dB, but using PMC the performance was restored to a level comparable with that obtained when training directly in the noise corrupted environment. Expand
Hierarchical variational loopy belief propagation for multi-talker speech recognition
TLDR
Results on the monaural speech separation task (SSC) data demonstrate that the presented Hierarchical Variational Max-Sum Product Algorithm (HVMSP) outperforms VMSP by over 2% absolute using 4 times fewer probablistic masks. Expand
A vector Taylor series approach for environment-independent speech recognition
  • P. Moreno, B. Raj, R. Stern
  • Computer Science
  • 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings
  • 1996
TLDR
This work introduces the use of a vector Taylor series (VTS) expansion to characterize efficiently and accurately the effects on speech statistics of unknown additive noise and unknown linear filtering in a transmission channel. Expand
Factorial models and refiltering for speech separation and denoising
TLDR
The max approximation to log spectrograms of mixtures is reviewed, why this motivates a β€œrefiltering” approach to separation and denoising, and how the process of inference in factorial probabilistic models performs a computation useful for deriving the masking signals needed in refiltering is described. Expand
Speech recognition in adverse environments: a probabilistic approach
TLDR
This thesis discusses the classification of distorted features using an optimal classifier, and it is shown how the generation of noisy speech can be represented as a generative graphical probability model. Expand
Factorial Models for Noise Robust Speech Recognition
TLDR
Noise compensation techniques for robust automatic speech recognition (ASR) attempt to improve system performance in the presence of acoustic interference, and the two main strategies used for model compensation approaches are model adaptation and model-based noise compensation. Expand
...
1
2
3
4
...