Learn More
Knowledge bases extracted automatically from the Web present new opportunities for data mining and exploration. Given a large, heterogeneous set of extracted relations, new tools are needed for searching the knowledge and uncovering relationships of interest. We present <i>WikiTables</i>, a Web application that enables users to interactively explore tabular(More)
Web tables form a valuable source of relational data. The Web contains an estimated 154 million HTML tables of relational data, with Wikipedia alone containing 1.6 million high-quality tables. Extracting the semantics of Web tables to produce machine-understandable knowledge has become an active area of research. A key step in extracting the semantics of(More)
This paper describes our submission for the ScienceIE shared task (SemEval2017 Task 10) on entity and relation extraction from scientific papers. Our model is based on the end-to-end relation extraction model of Miwa and Bansal (2016) with several enhancements such as semi-supervised learning via neural language models, character-level encoding, gazetteers(More)
Wikipedia's link structure is a valuable resource for natural language processing tasks, but only a fraction of the concepts mentioned in each article are annotated with hyperlinks. In this paper, we study how to augment Wikipedia with additional high-precision links. We present 3W, a system that identifies concept mentions in Wikipedia text, and links each(More)
Pre-trained word embeddings learned from unlabeled text have become a standard component of neural network archi-tectures for NLP tasks. However, in most cases, the recurrent network that operates on word-level representations to produce context sensitive representations is trained on relatively little labeled data. In this paper, we demonstrate a general(More)
In this paper, we report on our participation in the English Entity Linking task at TAC 2013. We present the WebSAIL Wikifier system, an entity disambiguation system that links textual mentions to their referent entities in Wikipedia. The system uses a supervised machine learning approach and a string-matching clustering method, and scores 58.1% B 3 + F1 on(More)
Latent variable topic models such as Latent Dirichlet Allocation (LDA) can discover topics from text in an unsupervised fashion. However, scaling the models up to the many distinct topics exhibited in modern corpora is challenging. " Flat " topic models like LDA have difficulty modeling sparsely expressed topics, and richer hierarchical models become(More)
Web Information Extraction (WIE) systems extract billions of unique facts, but integrating the assertions into a coherent knowledge base and evaluating across different WIE techniques remains a challenge. We propose a framework that utilizes natural language to integrate and evaluate extracted knowledge bases (KBs). In the framework, KBs are integrated by(More)