• Publications
  • Influence
DeepWalk: online learning of social representations
DeepWalk is an online learning algorithm which builds useful incremental results, and is trivially parallelizable, which make it suitable for a broad class of real world applications such as network classification, and anomaly detection.
Theano: A Python framework for fast computation of mathematical expressions
The performance of Theano is compared against Torch7 and TensorFlow on several machine learning models and recently-introduced functionalities and improvements are discussed.
The Power of Scale for Parameter-Efficient Prompt Tuning
This work explores “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks, and shows that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning.
mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer
The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this
Statistically Significant Detection of Linguistic Change
This meta-analysis approach constructs property time series of word usage, and then uses statistically sound change point detection algorithms to identify significant linguistic shifts.
Polyglot: Distributed Word Representations for Multilingual NLP
This work quantitatively demonstrates the utility of word embeddings by using them as the sole features for training a part of speech tagger for a subset of these languages and investigates the semantic features captured through the proximity of word groupings.
Character-Level Language Modeling with Deeper Self-Attention
This paper shows that a deep (64-layer) transformer model with fixed context outperforms RNN variants by a large margin, achieving state of the art on two popular benchmarks: 1.13 bits per character on text8 and 1.06 on enwik8.
ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models
This paper shows that a standard Transformer architecture can be used with minimal modifications to process byte sequences, characterize the trade-offs in terms of parameter count, training FLOPs, and inference speed, and shows that byte-level models are competitive with their token-level counterparts.
Efficient Natural Language Response Suggestion for Smart Reply
A computationally efficient machine-learned method for natural language response suggestion using feed-forward neural networks using n-gram embedding features that achieves the same quality at a small fraction of the computational requirements and latency.
Watch Your Step: Learning Node Embeddings via Graph Attention
This paper proposes a novel attention model on the power series of the transition matrix, which guides the random walk to optimize an upstream objective and improves state-of-the-art results on a comprehensive suite of real-world graph datasets including social, collaboration, and biological networks.