• Publications
  • Influence
Infusing Finetuning with Semantic Dependencies
TLDR
This approach applies novel probes to recent language models and finds that, unlike syntax, semantics is not brought to the surface by today’s pretrained models, and uses convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning, yielding benefits to natural language understanding tasks in the GLUE benchmark.
Dynamic Sparsity Neural Networks for Automatic Speech Recognition
TLDR
This paper presents Dynamic Sparsity Neural Networks (DSNN), a trained DSNN that, once trained, can instantly switch to any predefined sparsity configuration at run-time and can greatly ease the training process and simplify deployment in diverse scenarios with resource constraints.
Understanding Mention Detector-Linker Interaction in Neural Coreference Resolution
TLDR
This work dissects the best instantiation of the mainstream end-to-end coreference resolution model that underlies most current best-performing coreference systems, and empirically analyze the behavior of its two components: mention detector and mention linker.
WTMED at MEDIQA 2019: A Hybrid Approach to Biomedical Natural Language Inference
TLDR
This paper proposes a hybrid approach to biomedical NLI where different types of information are exploited for this task, using a base model that includes a pre-trained text encoder as the core component, and a syntax encoder and a feature encoder to capture syntactic and domain-specific information.
ABC: Attention with Bounded-memory Control
TLDR
This work shows that disparate approaches can be subsumed into one abstraction, attention with bounded-memory control (ABC), and it outperforms previous efficient attention models; compared to the strong transformer baselines, it significantly improves the inference time and space efficiency with no or negligible accuracy loss.
Learning with Latent Structures in Natural Language Processing: A Survey
TLDR
This work surveys three main families of methods to learn surrogate gradients, continuous relaxation, and marginal likelihood maximization via sampling to incorporate better inductive biases for improved end-task performance and better interpretability.