Learn More
The 2010 Fall Issue of AI Magazine includes an article on "Building Watson: An Overview of the DeepQA Project," written by the IBM Watson Research Team, led by David Ferucci. Read about this exciting project in the most detailed technical article available. We hope you will also take a moment to read through the archives of AI Magazine, (../issues.php) and(More)
An invaluable portion of scientific data occurs naturally in text form. Given a large unlabeled document collection, it is often helpful to organize this collection into clusters of related documents. By using a vector space model, text data can be treated as high-dimensional but sparse numerical data vectors. It is a contemporary challenge to efficiently(More)
A traditional goal of Artificial Intelligence research has been a system that can read unrestricted natural language texts on a given topic, build a model of that topic and reason over the model. Natural Language Processing advances in syntax and semantics have made it possible to extract a limited form of meaning from sentences. Knowledge Representation(More)
A source expansion algorithm automatically extends a given text corpus with related content from large external sources such as the Web. The expanded corpus is not intended for human consumption but can be used in question answering (QA) and other information retrieval or extraction tasks to find more relevant information and supporting evidence. We propose(More)
Most existing Question Answering (QA) systems adopt a type-and-generate approach to candidate generation that relies on a pre-defined domain ontology. This paper describes a type independent search and candidate generation paradigm for QA that leverages Wikipedia characteristics. This approach is particularly useful for adapting QA systems to domains where(More)
The first stage of processing in the IBM Watsoni system is to perform a detailed analysis of the question in order to determine what it is asking for and how best to approach answering it. Question analysis uses Watson's parsing and semantic analysis capabilities: a deep Slot Grammar parser, a named entity recognizer, a co-reference resolution component,(More)
Access to a large amount of knowledge is critical for success at answering open-domain questions for DeepQA systems such as IBM Watsoni. Formal representation of knowledge has the advantage of being easy to reason with, but acquisition of structured knowledge in open domains from unstructured data is often difficult and expensive. Our central hypothesis is(More)
A key phase in the DeepQA architecture is Hypothesis Generation, in which candidate system responses are generated for downstream scoring and ranking. In the IBM Watsoni system, these hypotheses are potential answers to Jeopardy!i questions and are generated by two components: search and candidate generation. The search component retrieves content relevant(More)
As part of the ongoing project, Project Halo, our goal is to build a system capable of answering questions posed by novice users to a formal knowledge base. In our current context, the knowledge base covers selected topics in physics, chemistry, and biology, and our question set consists of AP (advanced high-school) level examination questions. The task is(More)
David Ferrucci1, Eric Nyberg2, James Allan3, Ken Barker4, Eric Brown1, Jennifer Chu-Carroll1, Arthur Ciccolo1, Pablo Duboue1, James Fan1, David Gondek1, Eduard Hovy5, Boris Katz6, Adam Lally1, Michael McCord1, Paul Morarescu1, Bill Murdock1, Bruce Porter4, John Prager1, Tomek Strzalkowski7, Chris Welty1, Wlodek Zadrozny1 1IBM Research Division Thomas J.(More)