Learn More
Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many " plausible " ways, and if a clustering algorithm such as K-means initially fails to find one that is meaningful to a user, the only recourse may be for the user to manually tweak the metric until sufficiently good clusters are(More)
Observations consisting of measurements on relationships for pairs of objects arise in many settings, such as protein interaction and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing such data with probabilisic models can be delicate because the simple exchangeability assumptions underlying many boilerplate(More)
Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification ; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enough. In this paper,(More)
Generative models of text typically associate a multinomial with every class label or topic. Even in simple models this requires the estimation of thousands of parameters; in multi-faceted latent variable models, standard approaches require additional latent " switching " variables for every token, complicating inference. In this paper, we propose an(More)
We consider the problem of learning a sparse multi-task regression, where the structure in the outputs can be represented as a tree with leaf nodes as outputs and internal nodes as clusters of the outputs at multiple granular-ity. Our goal is to recover the common set of relevant inputs for each output cluster. Assuming that the tree structure is available(More)
In this work, we address the problem of joint modeling of text and citations in the topic modeling framework. We present two different models called the Pairwise-Link-LDA and the Link-PLSA-LDA models. The Pairwise-Link-LDA model combines the ideas of LDA [4] and Mixed Membership Block Stochastic Models [1] and allows modeling arbitrary link structure.(More)
The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as " sports " or " entertainment " are rendered differently in each geographic(More)
We formulate the problem of non-projective dependency parsing as a polynomial-sized integer linear program. Our formulation is able to handle non-local output features in an efficient manner; not only is it compatible with prior knowledge encoded as hard constraints , it can also learn soft constraints from data. In particular, our model is able to learn(More)
Supervised topic models utilize document's side information for discovering predictive low dimensional representations of documents; and existing models apply likelihood-based estimation. In this paper, we present a max-margin supervised topic model for both continuous and categorical response variables. Our approach, the maximum entropy discrimination(More)
The saliency of regions or objects in an image can be significantly boosted if they recur in multiple images. Leveraging this idea, cosegmentation jointly segments common regions from multiple images. In this paper, we propose CoSand, a distributed cosegmentation approach for a highly variable large-scale image collection. The seg-mentation task is modeled(More)