Benyah Shaparenko

Learn More
We now have incrementally-grown databases of text documents ranging back for over a decade in areas ranging from personal email, to news-articles and conference proceedings. While accessing individual documents is easy, methods for overviewing and understanding these collections as a whole are lacking in number and in scope. In this paper, we address one(More)
One goal of text mining is to provide readers with automatic methods for quickly finding the key ideas in individual documents and whole corpora. To this effect, we propose a statistically well-founded method for identifying the original ideas that a document contributes to a corpus, focusing on self-referential diachronic corpora such as research(More)
This paper considers the problem of analyzing the development of a document collection over time without requiring meaningful citation data. Given a collection of timestamped documents, we formulate and explore the following two questions. First, what are the main topics and how do these topics develop over time? Second, to gain insight into the dynamics(More)
Much search engine revenue comes from sponsored search ads displayed with algorithmic search results. To maximize revenue, it is essential to choose a good slate of ads for each query, requiring accurate prediction of whether or not users will click on an ad. Click prediction is relatively easy for the ads that have been displayed many times, and have(More)