• Publications
  • Influence
On the Variance of the Adaptive Learning Rate and Beyond
TLDR
The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam. Expand
  • 375
  • 67
  • PDF
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
TLDR
We propose a new computational framework for robust and efficient fine-tuning for pre-trained language models. Expand
  • 32
  • 5
  • PDF
Learning to Defense by Learning to Attack
TLDR
This work proposes a new adversarial training method based on a generic learning-to-learn (L2L) framework. Expand
  • 10
  • 3
  • PDF
Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds
TLDR
A line of research reveals that neural networks can approximate certain classes of functions with an arbitrary accuracy, while the size of the network scales exponentially with respect to the data dimension. Expand
  • 20
  • 2
  • PDF
On Computation and Generalization of Generative Adversarial Networks under Spectrum Control
TLDR
Method Inception Score FID CIFAR-10 STL-10 CIFar-10 ST-10 Real Data 11.0± .29 54.9 CNN Baseline WGAN-GP 6.08± .26 7.77± .07 25.3± .21 41.5± .30 SN-GAN 7.31± .09 8.69± .08 25.2± .22 44.1± .35 Orthogonal Reg. Expand
  • 7
  • 2
  • PDF
Nanoscale Synaptic Membrane Mimetic Allows Unbiased High Throughput Screen That Targets Binding Sites for Alzheimer’s-Associated Aβ Oligomers
Despite their value as sources of therapeutic drug targets, membrane proteomes are largely inaccessible to high-throughput screening (HTS) tools designed for soluble proteins. An important exampleExpand
  • 29
  • 1
  • PDF
BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision
TLDR
We study the open-domain named entity recognition (NER) problem under distant supervision. Expand
  • 9
  • 1
  • PDF
Transformer Hawkes Process
TLDR
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies and meanwhile enjoys computational efficiency. Expand
  • 6
  • 1
  • PDF
On Scalable and Efficient Computation of Large Scale Optimal Transport
TLDR
We propose an implicit generative learning-based framework called SPOT (Scalable Push-forward of Optimal Transport). Expand
  • 5
  • 1
  • PDF
Hole expansion characteristics of ultra high strength steels
Abstract The hole expansion ratio is a key indicator to evaluate stretch flanging performance of steel sheets, which is usually obtained by hole expanding test using cylindrical or conical punch.Expand
  • 42
  • PDF