• Publications
  • Influence
QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension
TLDR
A new Q\&A architecture called QANet is proposed, which does not require recurrent networks, and its encoder consists exclusively of convolution and self-attention, where convolution models local interactions andSelf-att attention models global interactions. Expand
Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks
TLDR
This generative adversarial network (GAN)-based method adapts source-domain images to appear as if drawn from the target domain, and outperforms the state-of-the-art on a number of unsupervised domain adaptation scenarios by large margins. Expand
Rethinking Attention with Performers
TLDR
Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear space and time complexity, without relying on any priors such as sparsity or low-rankness are introduced. Expand
Model-based reinforcement learning for biological sequence design
TLDR
A model-based variant of PPO, DyNA-PPO, is proposed to improve sample efficiency and performs significantly better than existing methods in settings in which modeling is feasible, while still not performing worse in situations in which a reliable model cannot be learned. Expand
Population-Based Black-Box Optimization for Biological Sequence Design
TLDR
It is shown that P3BO outperforms any single method in its population, proposing higher quality sequences as well as more diverse batches and Adaptive-P3BO are a crucial step towards deploying ML to real-world sequence design. Expand
Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers
TLDR
A new Transformer architecture, Performer, based on Fast Attention Via Orthogonal Random features (FAVOR), which demonstrates its effectiveness on the challenging task of protein sequence modeling and provides strong theoretical guarantees: unbiased estimation of the attention matrix and uniform convergence. Expand
Learning Hierarchical Semantic Segmentations of LIDAR Data
TLDR
Experiments with LIDAR scans collected by Google Street View cars throughout ~100 city blocks of New York City show that the algorithm provides better segmentations and classifications than simple alternatives for cars, vans, traffic lights, and street lights. Expand
K-median Algorithms: Theory in Practice
We define the distance metric as dij for i ∈ {1, . . . , n}, j ∈ {1, . . . , n}, such that dij is the distance between points i and j in the metric space X. Kariv and Hakim [1] proved that findingExpand
Program Synthesis with Large Language Models
TLDR
The limits of the current generation of large language models for program synthesis in general purpose programming languages are explored, and the semantic grounding of these models is explored by fine-tuning them to predict the results of program execution. Expand
Is Transfer Learning Necessary for Protein Landscape Prediction?
TLDR
It is shown that CNN models trained solely using supervised learning both compete with and sometimes outperform the best models from TAPE that leverage expensive pretraining on large protein datasets. Expand
...
1
2
...