• Publications
  • Influence
More than Words: Quantifying Language to Measure Firms' Fundamentals
We examine whether a simple quantitative measure of language can be used to predict individual firms' accounting earnings and stock returns. Our three main findings are: (1) the fraction of negativeExpand
  • 1,452
  • 114
  • PDF
Classification in Networked Data: a Toolkit and a Univariate Case Study
TLDR
We present NetKit, a modular toolkit for classification in networked data, and a case-study of its application to networkeddata used in prior machine learning research. Expand
  • 565
  • 85
  • PDF
Discovering users' topics of interest on twitter: a first look
TLDR
We present early results on discovering Twitter users' topics of interest by examining the entities they mention in their Tweets. Expand
  • 349
  • 25
  • PDF
Why do People Retweet? Anti-Homophily Wins the Day!
TLDR
Twitter and other microblogs have rapidly become a significant means by which people communicate with the world and each other in near realtime. Expand
  • 158
  • 10
  • PDF
Using graph-based metrics with empirical risk minimization to speed up active learning on networked data
TLDR
This work showed yet again that empirical risk minimization (ERM) was the best method to find the next instance to label and provided an efficient way to compute ERM with the semi-supervised classifier. Expand
  • 46
  • 9
  • PDF
ROC confidence bands: an empirical evaluation
TLDR
This paper is about constructing confidence bands around ROC curves. Expand
  • 63
  • 5
  • PDF
On the Study of Social Interactions in Twitter
TLDR
Twitter and other social media platforms are increasingly used as the primary way in which people speak with each other. Expand
  • 46
  • 4
  • PDF
Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
TLDR
It is our great pleasure to welcome you to the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). Expand
  • 41
  • 4
Confidence Bands for ROC Curves: Methods and an Empirical Study
TLDR
In this paper we study techniques for generat- ing and evaluating condence bands on ROC curves. Expand
  • 83
  • 3
  • PDF
Improving Learning in Networked Data by Combining Explicit and Mined Links
TLDR
This paper is about using multiple types of information for classification of networked data in a semi-supervised setting: given a fully described network (nodes and edges) with known labels for some of the nodes, predict the labels of the remaining nodes with an objective graph measure called node-based assortativity. Expand
  • 49
  • 3
  • PDF