Learn More
We present an objective function for learning with unlabeled data that utilizes auxiliary expectation constraints. We optimize this objective function using a procedure that alternates between information and moment projections. Our method provides an alternate interpretation of the posterior regular-ization framework (Graca et al., 2008), maintains(More)
We present SampleRank, an alternative to con-trastive divergence (CD) for estimating parameters in complex graphical models. SampleR-ank harnesses a user-provided loss function to distribute stochastic gradients across an MCMC chain. As a result, parameter updates can be computed between arbitrary MCMC states. Sam-pleRank is not only faster than CD, but(More)
In this paper, we study a hybrid human-machine approach for solving the problem of Entity Resolution (ER). The goal of ER is to identify all records in a database that refer to the same underlying entity, and are therefore duplicates of each other. Our input is a graph over all the records in a database, where each edge has a probability denoting our prior(More)
In entity matching, a fundamental issue while training a classifier to label pairs of entities as either duplicates or non-duplicates is the one of selecting informative training examples. Although active learning presents an attractive solution to this problem, previous approaches minimize the misclassification rate (0-1 loss) of the classifier, which is(More)
The need to measure sequence similarity arises in information extraction, object identity , data mining, biological sequence analysis , and other domains. This paper presents discriminative string-edit CRFs, a finite-state conditional random field model for edit sequences between strings. Conditional random fields have advantages over generative approaches(More)
Web search engines can greatly benefit from knowledge about attributes of entities present in search queries. In this paper, we introduce lightly-supervised methods for extracting entity attributes from natural language text. Using these methods, we are able to extract large numbers of attributes of different entities at fairly high precision from a large(More)
Search, exploration and social experience on the Web has recently undergone tremendous changes with search engines, web portals and social networks o↵ering a di↵erent perspective on information discovery and consumption. This new perspective is aimed at capturing user intents, and providing richer and highly connected experiences. The new battle-ground(More)
Our team participated in the entity tagging and normalization tasks of Biocreative II. For the entity tagging task, we used a k-best MIRA learning algorithm with lexicons and automatically derived word clusters. MIRA accommodates different training loss functions, which allowed us to exploit gene alternatives in training. We also performed a greedy search(More)
In this paper we propose a new approach for semi-supervised structured output learning. Our approach uses relaxed labeling on un-labeled data to deal with the combinatorial nature of the label space and further uses domain constraints to guide the learning. Since the overall objective is non-convex, we alternate between the optimization of the model(More)