Layer Pruning on Demand with Intermediate CTC

  title={Layer Pruning on Demand with Intermediate CTC},
  author={Jaesong Lee and Jingu Kang and Shinji Watanabe},
Deploying an end-to-end automatic speech recognition (ASR) model on mobile/embedded devices is a challenging task, since the device computational power and energy consumption requirements are dynamically changed in practice. To overcome the issue, we present a training and pruning method for ASR based on the connectionist temporal classification (CTC) which allows reduction of model depth at run-time without any extra fine-tuning. To achieve the goal, we adopt two regularization methods… Expand

Figures from this paper

Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units
A hierarchical conditional model that is based on connectionist temporal classification (CTC) that is trained by auxiliary CTC losses applied to intermediate layers, where the vocabulary size of each target subword sequence is gradually increased as the layer becomes close to the word-level output. Expand


Intermediate Loss Regularization for CTC-Based Speech Recognition
  • Jaesong Lee, Shinji Watanabe
  • Computer Science, Engineering
  • ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
A simple and efficient auxiliary loss function for automatic speech recognition (ASR) based on the connectionist temporal classification (CTC) objective that well regularizes CTC training and improves the performance requiring only small modification of the code and small and no overhead during training and inference. Expand
Self-Distillation for Improving CTC-Transformer-Based ASR Systems
A novel training approach for encoder-decoderbased sequence-to-sequence (S2S) models by utilizing Transformer outputs and the source attention weights for making pseudo-targets that contain both the posterior and the timing information of each Transformer output. Expand
An Investigation of a Knowledge Distillation Method for CTC Acoustic Models
To improve the performance of unidirectional RNN-based CTC, which is suitable for real-time processing, the knowledge distillation (KD)-based model compression method for training a CTC acoustic model is investigated and a frame-level and a sequence-level KD method are evaluated. Expand
Very Deep Self-Attention Networks for End-to-End Speech Recognition
This work proposes to use self-attention via the Transformer architecture as an alternative to time-delay neural networks and shows that deep Transformer networks with high learning capacity are able to exceed performance from previous end-to-end approaches and even match the conventional hybrid systems. Expand
Learning small-size DNN with output-distribution-based criteria
This study proposes to better address issues by utilizing the DNN output distribution and cluster the senones in the large set into a small one by directly relating the clustering process to DNN parameters, as opposed to decoupling the senone generation and DNN training process in the standard training. Expand
Scaling Up Online Speech Recognition Using ConvNets
An online end-to-end speech recognition system based on Time-Depth Separable convolutions and Connectionist Temporal Classification that has almost three times the throughput of a well tuned hybrid ASR baseline while also having lower latency and a better word error rate. Expand
Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration
This work integrates connectionist temporal classification (CTC) with Transformer for joint training and decoding of automatic speech recognition (ASR) tasks and makes training faster than with RNNs and assists LM integration. Expand
Reducing Transformer Depth on Demand with Structured Dropout
LayerDrop, a form of structured dropout, is explored, which has a regularization effect during training and allows for efficient pruning at inference time, and shows that it is possible to select sub-networks of any depth from one large network without having to finetune them and with limited impact on performance. Expand
Deep Networks with Stochastic Depth
Stochastic depth is proposed, a training procedure that enables the seemingly contradictory setup to train short networks and use deep networks at test time and reduces training time substantially and improves the test error significantly on almost all data sets that were used for evaluation. Expand
AI Benchmark: All About Deep Learning on Smartphones in 2019
This paper evaluates the performance and compares the results of all chipsets from Qualcomm, HiSilicon, Samsung, MediaTek and Unisoc that are providing hardware acceleration for AI inference and discusses the recent changes in the Android ML pipeline. Expand