Learn More
Previous machine comprehension (MC) datasets are either too small to train end-to-end deep learning models, or not difficult enough to evaluate the ability of current MC techniques. The newly released SQuAD dataset alleviates these limitations , and gives us a chance to develop more realistic MC models. Based on this dataset, we propose a Multi-Perspective(More)
This paper presents a classifier-combination experimental framework for named entity recognition in which four diverse classi-fiers (robust linear classifier, maximum en-tropy, transformation-based learning, and hidden Markov model) are combined under different conditions. When no gazetteer or other additional training resources are used, the combined(More)
Transformation-based learning has been successfully employed to solve many natural language processing problems. It achieves state-of-the-art performance on many natural language processing tasks and does not overtrain easily. However, it does have a serious drawback: the training time is often intorelably long, especially on the large corpora which are(More)
Natural language sentence matching is a fundamental technology for a variety of tasks. Previous approaches either match sentences from a single direction or only apply single granular (word-byword or sentence-by-sentence) matching. In this work, we propose a bilateral multi-perspective matching (BiMPM) model. Given two sentences P and Q, our model first(More)
Entity detection and tracking is a relatively new addition to the repertoire of natural language tasks. In this paper, we present a statistical language-independent framework for identifying and tracking named, nominal and pronom-inal references to entities within unrestricted text documents, and chaining them into clusters corresponding to each logical(More)
In this paper we give an overview of the Tri-lingual Entity Discovery and Linking task at the Knowledge Base Population (KBP) track at TAC2015. In this year we introduced a new end-to-end Tri-lingual entity discovery and linking task which requires a system to take raw texts from three languages (English, Chinese and Spanish) as input, automatically extract(More)
This paper presents a comprehensive empirical exploration and evaluation of a diverse range of data characteristics which influence word sense disambiguation performance. It focuses on a set of six core supervised algorithms, including three variants of Bayesian classifiers, a cosine model, non-hierarchical decision lists, and an extension of the(More)
In this paper we consider the problem of building a system to predict readability of natural-language documents. Our system is trained using diverse features based on syntax and language models which are generally indicative of readability. The experimental results on a dataset of documents from a mix of genres show that the predictions of the learned(More)
Information extraction is one of the fundamentally important tasks in Natural Language Processing , and as such it has been the subject of many evaluations and competitions. The lat-est such evaluation, the Knowledge Base Population (KBP) part of the Text Analysis Conference 2010, is focusing on two aspects: entity linking and slot filling. This paper(More)