Learn More
IBM Research undertook a challenge to build a computer system that could compete at the human champion level in real time on the American TV quiz show, Jeopardy. The extent of the challenge includes fielding a real-time automatic contestant on the show, not merely a laboratory exercise. The Jeopardy Challenge helped us address requirements that led to the(More)
Distant supervision, heuristically labeling a corpus using a knowledge base, has emerged as a popular choice for training relation extractors. In this paper, we show that a significant number of “negative“ examples generated by the labeling process are false negatives because the knowledge base is incomplete. Therefore the heuristic for generating negative(More)
Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice, this discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel, previously unknown aspects of the data. In order to deal with this problem, we present an extension of the(More)
We present an extension of the well-known information bottleneck framework, called conditional information bottleneck, which takes negative relevance information into account by maximizing a conditional mutual information score. This general approach can be utilized in a data mining context to extract relevant information that is at the same time novel(More)
and ranking of answers in DeepQA D. C. Gondek A. Lally A. Kalyanpur J. W. Murdock P. A. Duboue L. Zhang Y. Pan Z. M. Qiu C. Welty The final stage in the IBM DeepQA pipeline involves ranking all candidate answers according to their evidence scores and judging the likelihood that each candidate answer is correct. In DeepQA, this is done using a machine(More)
Article history: Received 28 May 2012 Accepted 23 June 2012 Available online 7 August 2012 This paper presents a vision for applying the Watson technology to health care and describes the steps needed to adapt and improve performance in a new domain. Specifically, it elaborates upon a vision for an evidence-based clinical decision support system, based on(More)
extraction from documents J. Fan A. Kalyanpur D. C. Gondek D. A. Ferrucci Access to a large amount of knowledge is critical for success at answering open-domain questions for DeepQA systems such as IBM Watsoni. Formal representation of knowledge has the advantage of being easy to reason with, but acquisition of structured knowledge in open domains from(More)
This paper addresses the question what is the outcome of multi-agent learning via no-regret algorithms in repeated games? Speci cally, can the outcome of no-regret learning be characterized by traditional game-theoretic solution concepts, such as Nash equilibrium? The conclusion of this study is that no-regret learning is reminiscent of ctitious play: play(More)
This paper aims to solve the problem of improving the ranking of answer candidates for factoid based questions in a state-of-the-art Question Answering system. We first provide an extensive comparison of 5 ranking algorithms on two datasets -- from the Jeopardy quiz show and a medical domain. We then show the effectiveness of a cascading approach, where the(More)
This paper describes a novel approach to the semantic relation detection problem. Instead of relying only on the training instances for a new relation, we leverage the knowledge learned from previously trained relation detectors. Specifically, we detect a new semantic relation by projecting the new relation’s training instances onto a lower dimension topic(More)