• Corpus ID: 8492618

Evaluating Ranking Diversity and Summarization in Microblogs using Hashtags

  title={Evaluating Ranking Diversity and Summarization in Microblogs using Hashtags},
  author={David Fisher and Ashish Jain and Mostafa Keikha and W. Bruce Croft and Nedim Lipka},
Diversification techniques for web search have recently been developed that assume that, for each query, there is a set of underlying aspects or subtopics that address specific user intents. These techniques attempt to balance the relevance of the retrieved documents with the coverage of the aspects. Evaluation of diversification techniques requires some way of defining a set of aspects for each test query and a “gold standard” assignment of documents to aspects. This has made the study of… 

Figures and Tables from this paper

Search Result Diversification in Short Text Streams
A dynamic Dirichlet multinomial mixture topic model, called D2M3, is proposed, as well as a Gibbs sampling algorithm for the inference of the diversification algorithm, and it is found that SDA statistically significantly outperforms state-of-the-art non-streaming retrieval methods, plain streaming retrieved methods, aswell as streaming diversification methods that use other dynamic topic models.
Dynamic Clustering of Streaming Short Documents
A new dynamic clustering topic model - DCT - is proposed that enables tracking the time-varying distributions of topics over documents and words over topics, and overcomes the difficulty of handling short text by assigning a single topic to each short document.
Ranking ideas for diversity and quality
This paper provides an algorithm to rank design ideas such that the ranked list simultaneously maximizes the quality and diversity of recommended designs and reduces the time required to review diverse, high-quality ideas from around 25 hours to 90 minutes.
When Crowds Give You Lemons
DBLemons is introduced - a crowd-based idea filtering strategy that addresses issues by asking voters to identify the worst rather than the best ideas using a "bag of lemons'' voting approach, and by exposing voters to a wider idea spectrum, thanks to a dynamic diversity-based ranking system balancing idea quality and coverage.
Diversity and Novelty: Measurement, Learning and Optimization
This dissertation proposes new submodular and supermodular objective functions to measure diversity and develop multiple matching algorithms for diverse team formation in offline and online cases and proposes an entropy-based diversity metric, which is more accurate and sensitive than benchmarks.


Building a Microblog Corpus for Search Result Diversification
This paper addresses the lack of a microblog-based diversification corpus for search on microblogging platforms, and shows that this corpus fulfils a number of diversification criteria as described in the literature.
On choosing an effective automatic evaluation metric for microblog summarisation
A ranking of summarisation systems under three automatic summarisation evaluation metrics from the literature is determined, finding that Fraction of Topic Words better agrees with what users tell us about the quality and effectiveness of microblog summaries than the ROUGE-1 measure that is most commonly reported in the literature.
Exploiting query reformulations for web search result diversification
A novel probabilistic framework for Web search result diversification, which explicitly accounts for the various aspects associated to an underspecified query, is introduced and diversify a document ranking by estimating how well a given document satisfies each uncovered aspect and the extent to which different aspects are satisfied by the ranking as a whole.
Term level search result diversification
It is demonstrated that term-level diversification, with topic terms identified automatically from the search results using a simple greedy algorithm, significantly outperforms methods that attempt to create a full topic structure for diversification.
Result Diversification for Tweet Search
This paper addresses diversification of results in tweet search by adopting several methods from the text summarization and web search domains, and provides an exhaustive evaluation of all the methods using a standard dataset specifically tailored for this purpose.
Finding topic words for hierarchical summarization
This paper defines summarization in terms of a probabilistic language model and uses the definition to explore a new technique for automatically generating topic hierarchies by applying a graph-theoretic algorithm, which is an approximation of the Dominating Set Problem.
The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries
A method for combining query-relevance with information-novelty in the context of text retrieval and summarization and preliminary results indicate some benefits for MMR diversity ranking in document retrieval and in single document summarization.
Diversifying search results
This work proposes an algorithm that well approximates this objective in general, and is provably optimal for a natural special case, and generalizes several classical IR metrics, including NDCG, MRR, and MAP, to explicitly account for the value of diversification.
Identifying entity aspects in microblog posts
This paper compares different IR techniques and opinion target identification methods for automatically identifying aspects of the entity of interest given a stream of microblog posts and finds that simple statistical methods such as TF.IDF are a strong baseline for the task, significantly outperforming opinion-oriented methods.
Diversity by proportionality: an election-based approach to search result diversification
It is demonstrated empirically that the proposed framework for optimizing proportionality for search result diversification significantly outperforms the top performing approach in the literature not only on the proposed metric for proportionality, but also on several standard diversity measures.