Carlos de Juan Carbonell

Learn More
This paper presents a method for combining query-relevance with information-novelty in the context of text retrieval and summarization. The Maximal Marginal Relevance (MMR) criterion strives to reduce redundancy while maintaining query relevance in re-ranking retrieved documents and in selecting apprw priate passages for text summarization. Preliminary(More)
Topic Detection and Tracking (TDT) is a DARPA-sponsored initiative to investigate the state of the art in finding and following new events in a stream of broadcast news stories. The TDT problem consists of three major tasks: (1) segmenting a stream of data, especially recognized speech, into distinct stories; (2) identifying those news stories that are the(More)
Language modeling is the attempt to characterize, capture and exploit regularities in natural language. In statistical language modeling, large amounts of text are used to automatically determine the model’s parameters. Language modeling is useful in automatic speech recognition, machine translation, and any other application that processes natural language(More)
Real-world relational data are seldom stationary, yet traditional collaborative filtering algorithms generally rely on this assumption. Motivated by our sales prediction problem, we propose a factor-based algorithm that is able to take time into account. By introducing additional factors for time, we formalize this problem as a tensor factorization with a(More)
One of the key steps in creating quality interactive drama is the ability to create quality interactive characters (or believable agents). Two important aspects of such characters will be that they appear emotional and that they can engage in social interactions. My basic approach to these problems has been to use a broad agent architecture and minimal(More)
Abstract Meaning Representation (AMR) is a semantic formalism for which a growing set of annotated examples is available. We introduce the first approach to parse sentences into this representation, providing a strong baseline for future improvement. The method is based on a novel algorithm for finding a maximum spanning, connected subgraph, embedded within(More)
This paper investigates the use and extension of text retrieval and clustering techniques for event detection. The task is to automatically detect novel events from a temporally-ordered stream of news stories, either retrospectively or as the stories arrive. We applied hierarchical and non-hierarchical document clustering algorithms to a corpus of 15,836(More)
Human-quality text summarization systems are di cult to design, and even more di cult to evaluate, in part because documents can di er along several dimensions, such as length, writing style and lexical usage. Nevertheless, certain cues can often help suggest the selection of sentences for inclusion in a summary. This paper presents our analysis of(More)
Planning is a complex reasoning task that is well suited for the study of improving performance and knowledge by learning, i.e. by accumulation and interpretation of planning experience. PRODIGY is an architecture that integrates planning with multiple learning mechanisms. Learning occurs at the planner’s decision points and integration in PRODIGY is(More)
Artificial intelligence has progressed to the point where multiple cognitive capabilities are being integrated into computational architectures, such as SOAR, PRODIGY, THEO, and ICARUS. This paper reports on the PRODIGY architecture, describing its planning and problem solving capabilities and touching upon its multiple learning methods. Learning in PRODIGY(More)