• Corpus ID: 6518626

Rank Aggregation in Scientific Publication Databases Based on Logistic Regression

  title={Rank Aggregation in Scientific Publication Databases Based on Logistic Regression},
  author={Martin Vesely and Martin Rajman},
The goal of the d-Rank project is to study rank aggregation in scientific publication databases. In our work we focus in particular on document ranking in the domain of particle physics and we work with the collection of CERN publications called the CERN Document Server. In this report we present the main advances achieved within the second phase of the project. The most important achievements notably include a c reation of an extended CDS referential as an IR evaluation resource… 

Figures and Tables from this paper

D-Rank: A Framework for Score Aggregation in Specialized Search
This paper presents an approach to score aggregation for specialized search systems, and describes rank aggregation framework with score normalization, and presents results that are obtained with aggregations based on logistic regression using both ranks and scores.


Perspectives for Rank Aggregation within Scientific Publication Databases
This document presents the perspectives of the rank aggregation within scientific publication databases based on the experience from the work on the CERN document Server collection of documents in the domain of particle and high energy physics.
Database merging strategy based on logistic regression
Merging Results From Isolated Search Engines
Two new techniques for merging search results are introduced: Feature Distance ranking algorithms and Reference Statistics, which are found to be more eeective than existing methods in an isolated-server environment such as the World Wide Web.
From Fulltext Documents to Structured Citations: CERN's Automated Solution
The specific work done within a collaboration between CERN, Geneva University and the University of Sunderland in order to successfully achieve the automated acquisition of structured citations from fulltext documents is focused on.
Inferring probability of relevance using the method of logistic regression
  • F. Gey
  • Computer Science
    SIGIR '94
  • 1994
The model utilizes the technique of logistic regression to obtain equations which rank documents by probability of relevance as a function of document and query properties and is compared directly to the particular vector space model of retrieval which uses term-frequency/inverse-document-frequency weighting and the cosine similarity measure.
CERN Document Server Software: the integrated digital library
The design philosophy of CDSware and its modular, extensible, architecture is discussed and by means of a flow-chart the operational workflow of the system is presented, depicting its module interactions.
CERN document server: Document management system for grey literature in a networked environment
The CERN Document Server Software suite is presented that is a free software package main-tained by CERN providing an online digital library solution for mid- to large-sized document repositories, mainly with respect to federated data processing within the Open Archive Initiative framework.
Information Resources in High-Energy Physics: Surveying the Present Landscape and Charting the Future Course
A survey of about 10p of practitioners in the field reveals usage trends and information needs and offers an insight into the most important features that users require to optimize their research workflow.
Experiences in Automatic Keywording of Particle Physics Literature
A project being carried out at CERN for the development and integration of automatic keywording is described, which helps in the classification and retrieval of documents in the particle physics literature.
Logistic Regression Merging of Amberfish and Lucene Multisearch Results
A simple logistic-regression based isolated data fusion algorithm was used to merge results from two free open-source text retrieval tools, and basic performance measures are reported and discussed, and future projects are outlined.