• Publications
  • Influence
The Hitachi/JHU CHiME-5 system: Advances in speech recognition for everyday home environments using multiple microphone arrays
The way to apply multiple data augmentation methods, residual bidirectional long short-term memory, 4-ch acoustic models, multiple-array combination methods, hypothesis deduplication method, and speaker adaptation technique of neural beamformer are newly developed. Expand
A Unifying Framework for Blind Source Separation Based on A Joint Diagonalizability Constraint
We present a unifying framework for dealing with convolutive blind source separation (BSS), which fully models inter-channel, inter-frequency, and inter-frame correlation of sources by latentExpand
Independent Low-Rank Matrix Analysis with Decorrelation Learning
A new BSS method is proposed that estimates a linear transformation for spectral decorrelation and performs ILRMA in the transformed domain and develops algorithms based on block coordinate descent methods with closed-form solutions for this problem. Expand
Independent Low-Rank Matrix Analysis Based on Multivariate Complex Exponential Power Distribution
A source spectrum model in ILRMA is generalized to explicitly model the strong higher-order correlations between neighboring frequency bins of speech signals, and multivariate complex exponential power distributions are introduced as source distributions assumed inILRMA. Expand
Overdetermined Independent Vector Analysis
We address the convolutive blind source separation problem for the (over-)determined case where (i) the number of nonstationary target-sources K is less than that of microphones M, and (ii) there areExpand
Independent vector analysis with frequency range division and prior switching
A novel source model is developed to improve the separation performance of independent vector analysis (IVA) for speech mixtures based on an EM algorithm, in which the IVA filters, states of sources, and permutation alignments between each pair of bands are jointly optimized. Expand
Acoustic Modeling for Distant Multi-talker Speech Recognition with Single- and Multi-channel Branches
A novel heterogeneous-input multi-channel acoustic model (AM) that has both single-channel and multi-Channel input branches and uniquely uses the power of a complemen-tal speech enhancement (SE) module while exploiting thePower of jointly trained AM and SE architecture is presented. Expand
Unified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection
The proposed unified system that incorporates speech source separation and automatic speech recognition for various noise environments is proposed and it is shown that the proposed training method is effective even when an input signal has been distorted through the source separation step. Expand
Beam-TasNet: Time-domain Audio Separation Network Meets Frequency-domain Beamformer
Experiments show that the proposed Beam-TasNet significantly outperforms the conventional TasNet without beamforming and, moreover, successfully achieves a word error rate comparable to an oracle mask-based MVDR beamformer. Expand
DNN-supported Mask-based Convolutional Beamforming for Simultaneous Denoising, Dereverberation, and Source Separation
This article proposes a method to integrate state-of-the-art techniques for mask-based beamforming into a single optimization framework that includes frequency-domain Convolutional Neural Network based utterance-level Permutation Invariant Training with a large receptive field, noisy Complex Gaussian Mixture Model based spatial clustering, and Weighted Power minimization Distortionless response convolutional beamforming. Expand