Learn More
Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadata extraction. We describe a Support Vector Machine classification-based method for metadata extraction from header part of research papers and show that it outperforms other(More)
Clickthrough and conversation rates estimation are two core predictions tasks in display advertising. We present in this article a machine learning framework based on logistic regression that is specifically designed to tackle the specifics of display advertising. The resulting system has the following characteristics: It is easy to implement and deploy, it(More)
Acknowledgements in research publications, like citations, indicate influential contributions to scientific work; however, large-scale acknowledgement analyses have traditionally been impractical due to the high cost of manual information extraction. In this paper we describe a mixture method for automatically mining acknowledgements from research documents(More)
The increasing amount of communication between individuals in e-formats (e.g. email, Instant messaging and the Web) has motivated computational research in social network analysis (SNA). Previous work in SNA has emphasized the social network (SN) topology measured by communication frequencies while ignoring the semantic information in SNs. In this paper, we(More)
We describe a machine learning approach for predicting sponsored search ad relevance. Our baseline model incorporates basic features of text overlap and we then extend the model to learn from past user clicks on advertisements. We present a novel approach using translation models to learn user click propensity from sparse click logs. Our relevance(More)
Text classification is still an important problem for unlabeled text; CiteSeer, a computer science document search engine, uses automatic text classification methods for document indexing. Text classification uses a document's original text words as the primary feature representation. However, such representation usually comes with high dimensionality and(More)
— We describe a novel maximum entropy (maxent) approach for generating online recommendations as a user navigates through a collection of documents. We show how to handle high-dimensional sparse data and represent it as a collection of ordered sequences of document requests. Our representation and the maxent approach have several advantages: (1) we can(More)
In on-line search and display advertising, the click-trough rate (CTR) has been traditionally a key measure of ad/campaign effectiveness. More recently, the market has gained interest in more direct measures of profitability, one early alternative is the conversion rate (CVR). CVRs measure the proportion of certain users who take a predefined, desirable(More)