Learn More
We present an objective function for learning with unlabeled data that utilizes auxiliary expectation constraints. We optimize this objective function using a procedure that alternates between information and moment projections. Our method provides an alternate interpretation of the posterior regular-ization framework (Graca et al., 2008), maintains(More)
We present SampleRank, an alternative to con-trastive divergence (CD) for estimating parameters in complex graphical models. SampleR-ank harnesses a user-provided loss function to distribute stochastic gradients across an MCMC chain. As a result, parameter updates can be computed between arbitrary MCMC states. Sam-pleRank is not only faster than CD, but(More)
In this paper, we study a hybrid human-machine approach for solving the problem of Entity Resolution (ER). The goal of ER is to identify all records in a database that refer to the same underlying entity, and are therefore duplicates of each other. Our input is a graph over all the records in a database, where each edge has a probability denoting our prior(More)
In entity matching, a fundamental issue while training a classifier to label pairs of entities as either duplicates or non-duplicates is the one of selecting informative training examples. Although active learning presents an attractive solution to this problem, previous approaches minimize the misclassification rate (0-1 loss) of the classifier, which is(More)
The need to measure sequence similarity arises in information extraction, object identity , data mining, biological sequence analysis , and other domains. This paper presents discriminative string-edit CRFs, a finite-state conditional random field model for edit sequences between strings. Conditional random fields have advantages over generative approaches(More)
Search, exploration and social experience on the Web has recently undergone tremendous changes with search engines, web portals and social networks o↵ering a di↵erent perspective on information discovery and consumption. This new perspective is aimed at capturing user intents, and providing richer and highly connected experiences. The new battle-ground(More)
This paper presents a WordNet based approach to text summarization. The document to be summarized is used to extract a " relevant " sub-graph from the WordNet graph. Weights are assigned to each node of this sub-graph using a strategy similar to the Google Page-ranking algorithm. These weights capture the relevance of the respective synsets with respect to(More)
In this paper we propose a new approach for semi-supervised structured output learning. Our approach uses relaxed labeling on un-labeled data to deal with the combinatorial nature of the label space and further uses domain constraints to guide the learning. Since the overall objective is non-convex, we alternate between the optimization of the model(More)
Traditionally, machine learning approaches for information extraction require human annotated data that can be costly and time-consuming to produce. However, in many cases, there already exists a database (DB) with schema related to the desired output, and records related to the expected input text. We present a conditional random field (CRF) that aligns(More)