Learn More
In this paper the transcription and evaluation of the corpus DIMEx100 for Mexican Spanish is presented. First we describe the corpus and explain the linguistic and computational motivation for its design and collection process; then, the phonetic antecedents and the alphabet adopted for the transcription task are presented; the corpus has been transcribed(More)
Passage Retrieval (PR) is typically used as the first step in current Question Answering (QA) systems. Most methods are based on the vector space model allowing the finding of relevant passages for general user needs, but failing on selecting pertinent passages for specific user questions. This paper describes a simple PR method specially suited for the QA(More)
This paper describes the system developed by the Language Technologies Lab at INAOE for the Spanish Question Answering task at CLEF 2006. The presented system is centered in a full data-driven architecture that uses machine learning and text mining techniques to identify the most probable answers to factoid and definition questions respectively. Its major(More)
The problem of the resolution of the lexical ambiguity seems to be stuck because of the knowledge acquisition bottleneck. Therefore, it is worthwhile to investigate the possibility of using the Web as a lexical resource. This paper explores two attempts of using Web counts collected through a search engine. The first approach calculates the hits of each(More)
Finding accurate information on the Web has become a challenge due to the increment in the number of documents available online. Current search engines retrieve relevant documents to general - often short - user queries, but fail extracting answers to simple factual questions in natural language. This work presents the basis of a statistical question(More)
Recent works on question answering are based on complex natural language processing techniques: named entity extractors, parsers, chunkers, etc. While these approaches have proven to be effective they have the disadvantage of being targeted to a particular language. In this paper we present a full data-driven method that uses simple lexical pattern matching(More)
One major problem of state-of-the-art Cross Language Question Answering systems is the translation of user questions. This paper proposes combining the potential of multiple translation machines in order to improve the final answering precision. In particular, it presents three different methods for this purpose. The first one focuses on selecting the most(More)
The problem of acquiring valuable information from the large amounts available today in electronic media requires automated mechanisms more natural and efficient than those already existing. The trend in the evolution of information retrieval systems goes toward systems capable of answering specific questions formulated by the user in her/his language. The(More)