Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
FastSpeech: Fast, Robust and Controllable Text to Speech
A novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS is proposed, which speeds up mel-Spectrogram generation by 270x and the end-to-end speech synthesis by 38x and is called FastSpeech.
Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction
- D. Xu, Jun Xiao, Zhou Zhao, Jian Shao, Di Xie, Yueting Zhuang
- Computer ScienceIEEE/CVF Conference on Computer Vision and…
- 1 June 2019
A self-supervised spatiotemporal learning technique which leverages the chronological order of videos to learn the spatiotmporal representation of the video by predicting the order of shuffled clips from the video.
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
FastSpeech 2 is proposed, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by directly training the model with ground-truth target instead of the simplified output from teacher, and introducing more variation information of speech as conditional inputs.
Investigating Capsule Networks with Dynamic Routing for Text Classification
- Wei Zhao, Jianbo Ye, Min Yang, Zeyang Lei, Suofei Zhang, Zhou Zhao
- Computer ScienceEMNLP
- 29 March 2018
This work proposes three strategies to stabilize the dynamic routing process to alleviate the disturbance of some noise capsules which may contain “background” information or have not been successfully trained.
Video Question Answering via Gradually Refined Attention over Appearance and Motion
This paper proposes an end-to-end model which gradually refines its attention over the appearance and motion features of the video using the question as guidance and demonstrates the effectiveness of the model by analyzing the refined attention weights during the question answering procedure.
Improving Automatic Source Code Summarization via Deep Reinforcement Learning
- Yao Wan, Zhou Zhao, +4 authors Philip S. Yu
- Computer Science33rd IEEE/ACM International Conference on…
- 1 September 2018
An abstract syntax tree structure as well as sequential content of code snippets into a deep reinforcement learning framework (i.e., actor-critic network) which provides the confidence of predicting the next word according to current state and an advantage reward composed of BLEU metric to train both networks.
Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos
A novel Cross-Modal Interaction Network (CMIN) is introduced to consider multiple crucial factors for this challenging task, including the syntactic structure of natural language queries; long-range semantic dependencies in video context and the sufficient cross-modal interaction.
Multilingual Neural Machine Translation with Knowledge Distillation
One model is enough to handle multiple languages, with comparable or even better accuracy than individual models, in this distillation-based approach to boost the accuracy of multilingual machine translation.
Dialogue Act Recognition via CRF-Attentive Structured Network
This paper tackles the problem of DAR from the viewpoint of extending richer Conditional Random Field (CRF) structured dependencies without abandoning end-to-end training and incorporates hierarchical semantic inference with memory mechanism on the utterance modeling at multiple levels.
MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension
A novel neural network architecture called Multi-layer Embedding with Memory Network (MEMEN) for machine reading task, which employs classic skip-gram model to the syntactic and semantic information of the words to train a new kind of embedding layer.