Stefan Schoenmackers

Learn More
Even the entire Web corpus does not explicitly answer all questions, yet inference can uncover many implicit answers. But where do inference rules come from? This paper investigates the problem of learning inference rules from Web text in an unsupervised, domain-independent manner. The SHERLOCK system, described herein, is a first-order learner that(More)
Most programs are repetitive, where similar behavior can be seen at different execution times. Proposed on-line systems automatically group these similar intervals of execution into phases, where the intervals in a phase have homogeneous behavior and similar resource requirements. These systems are driven by algorithms that dynamically classify intervals of(More)
In an embedded system, the cost of storing a program on-chip can be as high as the cost of a microprocessor. Compressing an application's code to reduce the amount of memory required is an attractive way to decrease costs. In this paper, we examine an executable form of program compression using <i>echo</i> instructions.With echo instructions, two or more(More)
NLP systems for tasks such as question answering and information extraction typically rely on statistical parsers. But the efficacy of such parsers can be surprisingly low, particularly for sentences drawn from heterogeneous corpora such as the Web. We have observed that incorrect parses often result in wildly implausible semantic interpretations of(More)
Machine reading is a long-standing goal of AI and NLP. In recent years, tremendous progress has been made in developing machine learning approaches for many of its subtasks such as parsing, information extraction, and question answering. However, existing end-to-end solutions typically require substantial amount of human efforts (e.g., labeled data and/or(More)
Even in a massive corpus such as the Web, a substantial fraction of extractions appear infrequently. This paper shows how to assess the correctness of sparse extractions by utilizing unsupervised language models. The REALM system, which combines HMMbased and n-gram-based language models, ranks candidate extractions by the likelihood that they are correct.(More)
Inference Over the Web Stefan Schoenmackers Co-Chairs of the Supervisory Committee: Professor Oren Etzioni Computer Science and Engineering Professor Daniel S. Weld Computer Science and Engineering The World Wide Web contains vast amounts of text written about nearly any topic imaginable. Recent work in Information Extraction has sought to recover the(More)