Learn More
This paper presents a method for combining query-relevance with information-novelty in the context of text retrieval and summarization. The Maximal Marginal Relevance (MMR) criterion strives to reduce redundancy while maintaining query relevance in re-ranking retrieved documents and in selecting appropriate passages for text summarization. Preliminary(More)
Meaning Representation (AMR) is a semantic formalism for which a growing set of annotated examples is available. We introduce the first approach to parse sentences into this representation , providing a strong baseline for future improvement. The method is based on a novel algorithm for finding a maximum spanning, connected subgraph, embedded within a(More)
Language modeling is the attempt to characterize, capture and exploit regularities in natural language. In statistical language modeling, large amounts of text are used to automatically determine the model's parameters. Language modeling is useful in automatic speech recognition, machine translation, and any other application that processes natural language(More)
Real-world relational data are seldom stationary, yet traditional collaborative filtering algorithms generally rely on this assumption. Motivated by our sales prediction problem, we propose a factor-based algorithm that is able to take time into account. By introducing additional factors for time, we formalize this problem as a tensor factorization with a(More)
Many scalable data mining tasks rely on active learning to provide the most useful accurately labeled instances. However, what if there are multiple labeling sources ('oracles' or 'experts') with different but unknown reliabilities? With the recent advent of inexpensive and scalable online annotation tools, such as Amazon's Mechanical Turk, the labeling(More)
Artificial intelligence has progressed to the point where multiple cognitive capabilities are being integrated into computational architectures, such as SOAR, PRODIGY, THEO, and ICARUS. This paper reports on the PRODIGY architecture, describing its planning and problem solving capabilities and touching upon its multiple learning methods. Learning in PRODIGY(More)
Blog feed search poses different and interesting challenges from traditional ad hoc document retrieval. The units of retrieval, the blogs, are collections of documents, the blog posts. In this work we adapt a state-of-the-art federated search model to the feed retrieval task, showing a significant improvement over algorithms based on the best performing(More)
The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Wright Laboratory or the U. S. Government. The third and fourth authors were supported by fellowships from the Ministerio de Educación y Ciencia of(More)
Translingual information retrieval (TIR) consists of providing a query in one language and searching document collections in one or more diierent languages. This paper introduces new TIR methods and reports on comparative T I R experiments with these new methods and with previously reported ones in a realistic setting. Methods fall into two categories:(More)