Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
Modern Information Retrieval - the concepts and technology behind search, Second edition
This paper presents a meta-modelling architecture for search that automates the very labor-intensive and therefore time-heavy and expensive and expensive process of manually cataloging and querying documents.
A new approach to text searching
We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with don't care symbols and complement symbols, and multiple patterns. In…
Information Retrieval: Data Structures and Algorithms
For programmers and students interested in parsing text, automated indexing, its the first collection in book form of the basic data structures and algorithms that are critical to the storage and retrieval of documents.
FA*IR: A Fair Top-k Ranking Algorithm
- Meike Zehlike, F. Bonchi, C. Castillo, S. Hajian, Mohamed Megahed, R. Baeza-Yates
- Computer ScienceCIKM
- 20 June 2017
This work defines and solves the Fair Top-k Ranking problem, and presents an efficient algorithm, which is the first algorithm grounded in statistical tests that can mitigate biases in the representation of an under-represented group along a ranked list.
Design and Implementation of Relevance Assessments Using Crowdsourcing
This work explores the design and execution of relevance judgments using Amazon Mechanical Turk as crowdsourcing platform, introducing a methodology for crowdsourcing relevance assessments and the results of a series of experiments using TREC 8 with a fixed budget.
Predicting The Next App That You Are Going To Use
This paper model the prediction of the next app as a classification problem and proposes an effective personalized method to solve it that takes full advantage of human-engineered features and automatically derived features.
Link analysis for Web spam detection
- L. Becchetti, C. Castillo, D. Donato, R. Baeza-Yates, S. Leonardi
- Computer ScienceTWEB
- 1 February 2008
After tenfold cross-validation, the best classifiers have a performance comparable to that of state-of-the-art spam classifiers that use content attributes, but are orthogonal to content-based methods.
Improved query difficulty prediction for the web
Improved Clarity is introduced, and it is demonstrated that it outperforms state-of-the-art predictors on three standard collections, including two large Web collections.
Using rank propagation and Probabilistic counting for Link-Based Spam Detection
This paper proposes spam detection techniques that only consider the link structure of Web, regardless of page contents, and compute statistics of the links in the vicinity of every Web page applying rank propagation and probabilistic counting over the Web graph.
Searching the Future
- R. Baeza-Yates
- Computer Science
A new retrieval problem: future retrieval is defined, which involves using news information to obtain future possible events and then search events related to the authors' current (or future) information needs, and includes time as a formal attribute for a document.