• Publications
  • Influence
A Structured Self-attentive Sentence Embedding
TLDR
A new model for extracting an interpretable sentence embedding by introducing self-attention is proposed, which uses a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. Expand
Applying deep learning to answer selection: A study and an open task
TLDR
A general deep learning framework is applied to address the non-factoid question answering task and demonstrates superior performance compared to the baseline methods and various technologies give further improvements. Expand
Advancements in Reordering Models for Statistical Machine Translation
TLDR
This model converts the reordering problem into a sequence labeling problem, i.e. a tagging task, and shows that it improves the baseline system by 1.32 BLEU and 1.53 TER on average. Expand
A Source-side Decoding Sequence Model for Statistical Machine Translation
TLDR
A source-side decoding sequence language model for phrase-based statistical machine translation that helps the decoder find the correct decoding sequence using word-aligned bilingual training data is proposed. Expand
Efficient Hyper-parameter Optimization for NLP Applications
TLDR
This paper proposes a multi-stage hyper-parameter optimization that breaks the problem into multiple stages with increasingly amounts of data, and demonstrates the utility of this new algorithm by evaluating its speed and accuracy against state-of-the-art Bayesian Optimization algorithms on classification and prediction tasks. Expand
Semantic Cohesion Model for Phrase-Based SMT
TLDR
A novel semantic cohesion model that utilizes the predicateargument structures as soft constraints and plays the role as a reordering model in the phrasebased statistical machine translation system. Expand
An Efficient, Distributed Stochastic Gradient Descent Algorithm for Deep-Learning Applications
TLDR
A distributed, bulk-synchronous stochastic gradient descent algorithm that allows for sparse gradient aggregation from individual learners is proposed and proves its convergence and shows that it has superior communication performance and convergence behavior over popular ASGD implementations such as Downpour and EAMSGD for deep-learning applications. Expand
Local System Voting Feature for Machine Translation System Combination
TLDR
A local system voting model by a neural network which is based on the words themselves and the combinatorial occurrences of the different system outputs gives system combination the option to prefer other systems at different word positions even for the same sentence. Expand
GaDei: On Scale-Up Training as a Service for Deep Learning
TLDR
The exceedingly high communication bandwidth requirement of TaaS is characterized using representative industrial deep learning workloads and GaDei, a highly optimized shared-memory based scale-up parameter server design is presented. Expand
Distributed Deep Learning for Question Answering
TLDR
Experimental results show that the distributed framework based on the message passing interface can accelerate the convergence speed at a sublinear scale and demonstrate the importance of distributed training. Expand
...
1
2
...