Learn More
Information-extraction (IE) systems seek to distill semantic relations from natural-language text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how(More)
Berners-Lee's compelling vision of a Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method - creating enough structured data to motivate the development of applications. This paper argues that autonomously "Semantifying Wikipedia" is the best way to solve the problem. We choose Wikipedia as an initial data(More)
Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as <i>infoboxes</i>), which enable self-supervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall on well-populated classes of articles, they fail in(More)
Although existing work has explored both information extraction and community content creation, most research has focused on them in isolation. In contrast, we see the greatest leverage in the synergistic pairing of these methods as two interlocking feedback cycles. This paper explores the potential synergy promised if these cycles can be made to accelerate(More)
There have been substantial changes in computing practices in the cyberspace, mainly as a result of the proliferation of low priced under-utilized powerfully heterogeneous computers are connected by high-speed links. In this paper we reminisce the vicissitude of computing platform and introduce our Portuguese-Chinese corpus-based machine translation (CBMT)(More)
The Intelligence in Wikipedia project at the University of Washington is combining self-supervised information extraction (IE) techniques with a mixed initiative interface designed to encourage communal content creation (CCC). Since IE and CCC are each powerful ways to produce large amounts of structured information, they have been studied extensively — but(More)
We consider the problem of finding related tables in a large corpus of heterogenous tables. Detecting related tables provides users a powerful tool for enhancing their tables with additional data and enables effective reuse of available public data. Our first contribution is a framework that captures several types of relatedness, including tables that are(More)