Learn More
Reducing or eliminating statistical redundancy between the components of high-dimensional vector data enables a lower-dimensional representation without significant loss of information. Recognizing the limitations of principal component analysis (PCA), researchers in the statistics and neural network communities have developed nonlinear extensions of PCA.(More)
This paper proposes a new approach for coreference resolution which uses the Bell tree to represent the search space and casts the coreference resolution problem as finding the best path from the root of the Bell tree to the leaf nodes. A Maximum Entropy model is used to rank these paths. The coreference performance on the 2002 and 2003 Automatic Content(More)
Extracting semantic relationships between entities is challenging because of a paucity of annotated data and the errors induced by entity detection modules. We employ Maximum Entropy models to combine diverse lexical, syntactic and semantic features derived from the text. Our system obtained competitive results in the Automatic Content Extraction (ACE)(More)
Entity detection and tracking is a relatively new addition to the repertoire of natural language tasks. In this paper, we present a statistical language-independent framework for identifying and tracking named, nominal and pronom-inal references to entities within unrestricted text documents, and chaining them into clusters corresponding to each logical(More)
We present a fast algorithm for non-linear dimension reduction. The algorithm builds a local linear model of the data by merging PCA with clustering based on a new distortion measure. Experiments with speech and image data indicate that the local linear algorithm produces encodings with lower distortion than those built by five layer auto-associative(More)
Companies often receive thousands of resumes for each job posting and employ dedicated screeners to short list qualified applicants. In this paper, we present PROSPECT, a decision support tool to help these screeners shortlist resumes efficiently. Prospect mines resumes to extract salient aspects of candidate profiles like skills, experience in each skill,(More)
With the emergence of e-commerce systems, successful information access on e-commerce websites becomes essential. Menu-driven navigation and keyword search currently provided by most commercial sites have considerable limitations, as they tend to overwhelm and frustrate users with lengthy, rigid and not very effective interactions. To provide an efficient(More)
Syntax based reordering has been shown to be an effective way of handling word order differences between source and target languages in Statistical Machine Translation (SMT) systems. We present a simple, automatic method to learn rules that reorder source sentences to more closely match the target language word order using only a source side parse tree and(More)
T he pragmatic goal of natural language (NL) and multimodal interfaces (speech recognition, keyboard entry, pointing, among others) is to enable ease-of-use for users/customers in performing more sophisticated human-computer interactions (HCI). NL research attempts to define extensive discourse models that in turn provide improved models of context-enabling(More)
As natural language understanding research advances towards deeper knowledge modeling, the tasks become more and more complex: we are interested in more nu-anced word characteristics, more linguistic properties, deeper semantic and syntactic features. One such example, explored in this article, is the mention detection and recognition task in the Automatic(More)