• Publications
  • Influence
Distributed Representations of Words and Phrases and their Compositionality
TLDR
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Efficient Estimation of Word Representations in Vector Space
TLDR
Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.
Large Scale Distributed Deep Networks
TLDR
This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.
QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension
TLDR
A new Q\&A architecture called QANet is proposed, which does not require recurrent networks, and its encoder consists exclusively of convolution and self-attention, where convolution models local interactions andSelf-att attention models global interactions.
Building high-level features using large scale unsupervised learning
TLDR
Contrary to what appears to be a widely-held intuition, the experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not.
Extracting Symptoms and their Status from Clinical Conversations
TLDR
Novel models tailored for a new application, that of extracting the symptoms mentioned in clinical conversations along with their status, are described, using a new hierarchical span-attribute tagging (SA-T) model and a variant of sequence-to-sequence model which decodes the symptoms and their status from a few speaker turns within a sliding window over the conversation.
Heterogeneous subgraph features for information networks
TLDR
Heterogeneous subgraph features reach the predictive power of manually engineered features that incorporate domain knowledge and are found to outperform state-of-the-art neural node embeddings in both tasks and across all data sets.
Semi-supervised Learning for Information Extraction from Dialogue
TLDR
This work presents a method for leveraging the unlabeled data to learn a better model than could be learned from the labeled data alone, and demonstrates an improvement on a clinical documentation task, particularly in the regime of small amounts of labeled data.
Generate the concept representation using OMOP ontology graph
TLDR
This study applied the node2vec algorithm to learn distributed representations of concepts based on the graph structure as defined in the concept_relationship table in the OMOP CDM to generate a robust distributed representation for any concept defined in OMOPCDM.
A Novel Encoder-Decoder Knowledge Graph Completion Model for Robot Brain
TLDR
An encoder-decoder model is proposed which embeds the interaction between entities and relations, and adds a gate mechanism to control the attention mechanism, and achieves better link prediction performance than state-of-the-art embedding models on two benchmark datasets, WN18RR and FB15k-237.