• Publications
  • Influence
QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension
TLDR
We propose a new Q\&A architecture called QANet, which does not require recurrent networks: Its encoder consists exclusively of convolution and self-attention, where convolution models local interactions and selfattention models global interactions. Expand
Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks
TLDR
Orthogonal matrix has shown advantages in training Recurrent Neural Networks (RNNs), but such matrix is limited to be square for the hidden-to-hidden transformation in RNNs. Expand
Learning to Skim Text
TLDR
We propose a modified LSTM with jumping, a recurrent network that learns how far to jump after reading a few words of the input text. Expand
On Computationally Tractable Selection of Experiments in Regression Models
We derive computationally tractable methods to select a small subset of experiment settings from a large pool of given design points. The primary focus is on linear regression models, while theExpand
Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension
TLDR
We propose the Neural Symbolic Reader (NeRd), which includes a reader, e.g., BERT, to encode the passage and question, and a programmer to generate a program that is executed to produce the answer. Expand
Efficient Structured Matrix Rank Minimization
TLDR
We study the problem of finding structured low-rank matrices using nuclear norm regularization where the structure is encoded by a linear map. Expand
DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization
TLDR
We propose a family of randomized primal-dual block coordinate algorithms that are especially suitable for asynchronous distributed implementation with parameter servers, and exploit its structure by doubly stochastic coordinate optimization with variance reduction. Expand
On Computationally Tractable Selection of Experiments in Measurement-Constrained Regression Models
TLDR
We derive computationally tractable methods to select a small subset of experiment settings from a large pool of given design points. Expand
AdaDelay: Delay Adaptive Distributed Stochastic Optimization
TLDR
We develop distributed stochastic convex optimization algorithms under a delayed gradient model in which server nodes update parameters and worker nodes compute stochastically (sub)gradients. Expand
Doubly Stochastic Primal-Dual Coordinate Method for Empirical Risk Minimization and Bilinear Saddle-Point Problem
We proposed a doubly stochastic primal-dual coordinate (DSPDC) optimization algorithm for empirical risk minimization, which can be formulated as a bilinear saddle-point problem. In each iteration,Expand
...
1
2
3
...