Janara Christensen

Learn More
How do we scale information extraction to the massive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowledge from the Web. In 2007, we introduced the Open Information Extraction (Open IE) paradigm which eschews handlabeled training examples, and avoids domainspecific(More)
This paper presents G-FLOW, a novel system for coherent extractive multi-document summarization (MDS).1 Where previous work on MDS considered sentence selection and ordering separately, G-FLOW introduces a joint model for selection and ordering that balances coherence and salience. G-FLOW’s core representation is a graph that approximates the discourse(More)
Open Information Extraction is a recent paradigm for machine reading from arbitrary text. In contrast to existing techniques, which have used only shallow syntactic features, we investigate the use of semantic features (semantic roles) for the task of Open IE. We compare TEXTRUNNER (Banko et al., 2007), a state of the art open extractor, with our novel(More)
Machine reading is a long-standing goal of AI and NLP. In recent years, tremendous progress has been made in developing machine learning approaches for many of its subtasks such as parsing, information extraction, and question answering. However, existing end-to-end solutions typically require substantial amount of human efforts (e.g., labeled data and/or(More)
Given a classification task, what is the best way to teach the resulting boundary to a human? While machine learning techniques can provide excellent methods for finding the boundary, including the selection of examples in an online setting, they tell us little about how we would teach a human the same task. We propose to investigate the problem of example(More)
Multi-document summarization (MDS) systems have been designed for short, unstructured summaries of 10-15 documents, and are inadequate for larger document collections. We propose a new approach to scaling up summarization called hierarchical summarization, and present the first implemented system, SUMMA. SUMMA produces a hierarchy of relatively short(More)
Open Information Extraction extracts relations from text without requiring a pre-specified domain or vocabulary. While existing techniques have used only shallow syntactic features, we investigate the use of semantic role labeling techniques for the task of Open IE. Semantic role labeling (SRL) and Open IE, although developed mostly in isolation, are quite(More)
We query Web Image search engines with words (e.g., spring) but need images that correspond to particular senses of the word (e.g., flexible coil). Querying with polysemous words often yields unsatisfactory results from engines such as Google Images. We build an image search engine, IDIOM, which improves the quality of returned images by focusing search on(More)
Whether automatically extracted or human generated, open-domain factual knowledge is often available in the form of semantic annotations (e.g., composed-by) that take one or more specific instances (e.g., rhapsody in blue, george gershwin) as their arguments. This paper introduces a method for converting flat sets of instance-level annotations into(More)