• Publications
  • Influence
Empirical study of topic modeling in Twitter
It is shown that by training a topic model on aggregated messages the authors can obtain a higher quality of learned model which results in significantly better performance in two real-world classification problems. Expand
Predicting popular messages in Twitter
It is shown that the method can successfully predict messages which will attract thousands of retweets with good performance and formulate the task into a classification problem and study two of its variants by investigating a wide spectrum of features based on the content of the messages. Expand
Web page classification: Features and algorithms
As work in Web page classification is reviewed, the importance of these Web-specific features and algorithms are noted, state-of-the-art practices are described, and the underlying assumptions behind the use of information from neighboring pages are tracked. Expand
Identifying link farm spam pages
Algorithms for detecting link farms automatically are presented by first generating a seed set based on the common link set between incoming and outgoing links of Web pages and then expanding it, providing a modified web graph to use in ranking page importance. Expand
Detection of Harassment on Web 2.0
Web 2.0 has led to the development and evolution of web-based communities and applications. These communities provide places for information sharing and collaboration. They also open t he door forExpand
Topical locality in the Web
Empirically testing whether topical locality mirrors spatial locality of pages on the Web finds that the likelihood of linked pages having similar textual content to be high, and the similarity of sibling pages increases when the links from the parent are close together, show the foundations necessary for the success of many web systems. Expand
Predicting Sequences of User Actions
Characteristics of an idealized algorithm that, when applied to sequences of user actions, would allow a user interface to adapt over time to an individual’s pattern of use are proposed. Expand
Co-factorization machines: modeling user interests and predicting individual decisions in Twitter
This paper builds predictive models for user decisions in Twitter by proposing Co-Factorization Machines (CoFM), an extension of a state-of-the-art recommendation model, to handle multiple aspects of the dataset at the same time, and concludes that CoFM with ranking-based loss functions is superior to state of theart methods and yields interpretable latent factors. Expand
Topical TrustRank: using topicality to combat web spam
This work proposes the use of topical information to partition the seed set and calculate trust scores for each topic separately and shows that the Topical TrustRank has a better performance than TrustRank in demoting spam sites or pages. Expand
Recognizing Nepotistic Links on the Web
High accuracy in initial experiments is reported to show the potential for using a machine learning tool to automatically recognize and eliminate nepotistic links— links between pages that are present for reasons other than merit. Expand