• Publications
  • Influence
Mining the peanut gallery: opinion extraction and semantic classification of product reviews
TLDR
This work develops a method for automatically distinguishing between positive and negative reviews and draws on information retrieval techniques for feature extraction and scoring, and the results for various metrics and heuristics vary depending on the testing situation.
Methods and metrics for cold-start recommendations
TLDR
A method for recommending items that combines content and collaborative data under a single probabilistic framework is developed, and it is demonstrated empirically that the various components of the testing strategy combine to obtain deeper understanding of the performance characteristics of recommender systems.
Collaborative Filtering by Personality Diagnosis: A Hybrid Memory and Model-Based Approach
TLDR
This work describes and evaluates a new method called personality diagnosis (PD), which compute the probability that a user is of the same "personality type" as other users, and, in turn, the likelihood that he or she will like new items.
Winners don't take all: Characterizing the competition for links on the web
TLDR
A simple generative model quantifies the degree to which the rich nodes grow richer, and how new (and poorly connected) nodes can compete, and accurately accounts for the true connectivity distributions of category-specific web pages, the web as a whole, and other social networks.
Eliciting properties of probability distributions
We investigate the problem of truthfully eliciting an expert's assessment of a property of a probability distribution, where a property is any real-valued function of the distribution such as mean or
A Utility Framework for Bounded-Loss Market Makers
TLDR
It is proved that hyperbolic absolute risk aversion utility market makers are equivalent to weighted pseudospherical scoring rule market makers, and a third equivalent formulation based on maintaining a cost function that seems most natural for implementation purposes is described.
Implementing Sponsored Search in Web Search Engines: Computational Evaluation of Alternative Mechanisms
TLDR
This work model and compare several mechanisms for allocating sponsored slots, including stylized versions of those used by Overture and Google, the two biggest brokers of sponsored search, and proposes a rank-revision strategy that weights clicks on lower ranked items more than clicks on higher ranked items.
Using internet searches for influenza surveillance.
TLDR
This work counted daily unique queries originating in the United States that contained influenza-related search terms from the Yahoo! search engine from March 2004 through May 2008, and estimated linear models, using searches with 1-10-week lead times as explanatory variables to predict the percentage of cultures positive for influenza and deaths attributable to pneumonia and influenza in the US.
Predicting consumer behavior with Web search
TLDR
This work uses search query volume to forecast the opening weekend box-office revenue for feature films, first-month sales of video games, and the rank of songs on the Billboard Hot 100 chart, finding in all cases that search counts are highly predictive of future outcomes.
Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments
TLDR
It is shown that secondary content information can often be used to overcome sparsity and appropriate mixture models incorporating secondary data produce significantly better quality recommenders than k-nearest neighbors (k-NN).
...
...