• Publications
  • Influence
Translating Embeddings for Modeling Multi-relational Data
TransE is proposed, a method which models relationships by interpreting them as translations operating on the low-dimensional embeddings of the entities, which proves to be powerful since extensive experiments show that TransE significantly outperforms state-of-the-art methods in link prediction on two knowledge bases.
Natural Language Processing (Almost) from Scratch
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity
Learning with Local and Global Consistency
A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points.
Gene Selection for Cancer Classification using Support Vector Machines
This paper proposes a new method of gene selection utilizing Support Vector Machine methods based on Recursive Feature Elimination (RFE), and demonstrates experimentally that the genes selected yield better classification performance and are biologically relevant to cancer.
Fisher discriminant analysis with kernels
A non-linear classification technique based on Fisher's discriminant which allows the efficient computation of Fisher discriminant in feature space and large scale simulations demonstrate the competitiveness of this approach.
Reading Wikipedia to Answer Open-Domain Questions
This approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs, indicating that both modules are highly competitive with respect to existing counterparts.
A unified architecture for natural language processing: deep neural networks with multitask learning
We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic
A Neural Attention Model for Abstractive Sentence Summarization
This work proposes a fully data-driven approach to abstractive sentence summarization by utilizing a local attention-based model that generates each word of the summary conditioned on the input sentence.
Curriculum learning
It is hypothesized that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of non-convex functions).
End-To-End Memory Networks
A neural network with a recurrent attention model over a possibly large external memory that is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings.