Learn More
We connect measures of public opinion measured from polls with sentiment measured from text. We analyze several surveys on consumer confidence and political opinion over the 2008 to 2009 period, and find they correlate to sentiment word frequencies in contempora-neous Twitter messages. While our results vary across datasets, in several cases the(More)
Identifying latent groups of entities from observed interactions between pairs of entities is a frequently encountered problem in areas like analysis of protein interactions and social networks. We present a model that combines aspects of mixed membership stochastic block models and topic models to improve entity-entity link modeling by jointly modeling(More)
Naive Bayes classifier has long been used for text categorization tasks. Its sibling from the unsupervised world, the probabilistic mixture of multinomial models, has likewise been successfully applied to text clustering problems. Despite the strong independence assumptions that these models make, their attractiveness come from low computational cost,(More)
Email has become increasingly ubiquitous in recent times bringing with it new problems. In this paper we revisit two such problems, namely information leak detection and recipient recommendation, and study the impact of previously proposed solutions on real email users. Previous work addressing these problems showed a lot of promise on static email corpora(More)
Political discourse in the United States is getting increasingly polarized. This polarization frequently causes different communities to react very differently to the same news events. Political blogs as a form of social media provide an unique insight into this phenomenon. We present a multi-target, semi-supervised latent variable model, MCR-LDA to model(More)
We present a pseudo-observed variable based regular-ization technique for latent variable mixed-membership models that provides a mechanism to impose preferences on the characteristics of aggregate functions of latent and observed variables. The regularization framework is used to regularize topic models, which are latent variable mixed membership models(More)
Modeling networks is an active area of research and is used for many applications ranging from bioinformatics to social network analysis. An important operation that is often performed in the course of graph analysis is node clustering. Popular methods for node clustering such as the normalized cut method have their roots in graph partition optimization and(More)
People often make serious mistakes when addressing email messages. One type of costly mistake is an " email leak " , i.e., accidentally sending a message to an unintended recipient — a widespread problem that can severely harm individuals and corporations. Another type of addressing error is forgetting to add an intended collaborator as recipient, a likely(More)
Political blogs as a form of social media allow for an uniquely interactive form of political discourse. This is especially evident in fo-cused blogs with a strong ideological identity. We investigate techniques to identify topics within the context of the community, which when discussed in a blog post evoke a discernible positive or negative collective(More)
News interfaces are largely driven by recent information, even if many events are better interpreted in context of previous events. To address this problem, we consider the task of constructing an explicit representation of a " saga " —a long-running series of related events. We define a timeline as a concrete representation of a " saga " and we propose two(More)