Learn More
Web-scale data has been used in a diverse range of language research. Most of this research has used web counts for only short, fixed spans of context. We present a unified view of using web counts for lexical disambiguation. Unlike previous approaches , our supervised and unsupervised systems combine information from multiple and overlapping segments of(More)
We present a discriminative method for learning selectional preferences from unlabeled text. Positive examples are taken from observed predicate-argument pairs, while negatives are constructed from unobserved combinations. We train a Support Vector Machine classifier to distinguish the positive from the negative instances. We show how to partition the(More)
We present an automatic approach to determining whether a pronoun in text refers to a preceding noun phrase or is instead non-referential. We extract the surrounding tex-tual context of the pronoun and gather, from a large corpus, the distribution of words that occur within that context. We learn to reliably classify these distributions as representing(More)
There has been much recent research on identifying global community structure in networks. However, most existing approaches require complete information of the graph in question, which is impractical for some networks, e.g. the World Wide Web (WWW). Algorithms for local community detection have been proposed but their results usually contain many outliers.(More)
BACKGROUND Gene expression microarray is a powerful technology for genetic profiling diseases and their associated treatments. Such a process involves a key step of biomarker identification, which are expected to be closely related to the disease. A most important task of these identified genes is that they can be used to construct a classifier which can(More)
Traditional relation extraction seeks to identify pre-specified semantic relations within natural language text, while open Information Extraction (Open IE) takes a more general approach , and looks for a variety of relations without restriction to a fixed relation set. With this generalization comes the question, what is a relation? For example, should the(More)
We present a framework for visualizing remote distributed data sources using a multiuser immersive virtual reality environment. DIVE-ON is a system prototype that consolidates distributed data sources into a multidimensional data model, transports user-specified views to a 3D immersive display, and presents various data attributes and mining results as(More)
Given two genomic maps G 1 and G 2 each represented as a sequence of n gene markers, the maximal strip recovery (MSR) problem is to retain the maximum number of markers in both G 1 and G 2 such that the resultant subsequences, denoted as G * 1 and G * 2 , can be partitioned into the same set of maximal strips, which are common substrings of length greater(More)