Learn More
How do we scale information extraction to the massive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowledge from the Web. In 2007, we introduced the Open Information Extraction (Open IE) paradigm which eschews hand-labeled training examples, and avoids domain-specific(More)
This paper presents G-FLOW, a novel system for coherent extractive multi-document sum-marization (MDS). 1 Where previous work on MDS considered sentence selection and ordering separately, G-FLOW introduces a joint model for selection and ordering that balances coherence and salience. G-FLOW's core representation is a graph that approximates the discourse(More)
Open Information Extraction is a recent paradigm for machine reading from arbitrary text. In contrast to existing techniques, which have used only shallow syntactic features, we investigate the use of semantic features (semantic roles) for the task of Open IE. We compare TEXTRUNNER (Banko et al., 2007), a state of the art open extractor, with our novel(More)
Machine reading is a long-standing goal of AI and NLP. In recent years, tremendous progress has been made in developing machine learning approaches for many of its subtasks such as parsing, information extraction, and question answering. However, existing end-to-end solutions typically require substantial amount of human efforts (e.g., labeled data and/or(More)
Open Information Extraction extracts relations from text without requiring a pre-specified domain or vocabulary. While existing techniques have used only shallow syntactic features, we investigate the use of semantic role labeling techniques for the task of Open IE. Semantic role labeling (SRL) and Open IE, although developed mostly in isolation, are quite(More)
Multi-document summarization (MDS) systems have been designed for short, un-structured summaries of 10-15 documents, and are inadequate for larger document collections. We propose a new approach to scaling up summarization called hierarchical summarization, and present the first implemented system, SUMMA. SUMMA produces a hierarchy of relatively short(More)
We query Web Image search engines with words (e.g., spring) but need images that correspond to particular senses of the word (e.g., flexible coil). Querying with poly-semous words often yields unsatisfactory results from engines such as Google Images. We build an image search engine, IDIOM, which improves the quality of returned images by focusing search on(More)
Whether automatically extracted or human generated, open-domain factual knowledge is often available in the form of semantic annotations (e.g., composed-by) that take one or more specific instances (e.g., rhapsody in blue, george gershwin) as their arguments. This paper introduces a method for converting flat sets of instance-level annotations into(More)
ii Introduction It has been a long term vision of Artificial Intelligence to develop Learning by Reading systems that can capture knowledge from naturally occurring texts, convert it into a deep logical notation and perform some inferences/reasoning on them. Such systems directly build on relatively mature areas of research, including Information Extraction(More)