Learning Visual Features from Snapshots for Web Search

  title={Learning Visual Features from Snapshots for Web Search},
  author={Yixing Fan and J. Guo and Yanyan Lan and Jun Xu and Liang Pang and Xueqi Cheng},
  journal={Proceedings of the 2017 ACM on Conference on Information and Knowledge Management},
  • Yixing FanJ. Guo Xueqi Cheng
  • Published 19 October 2017
  • Computer Science
  • Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
When applying learning to rank algorithms to Web search, a large number of features are usually designed to capture the relevance signals. Most of these features are computed based on the extracted textual elements, link analysis, and user logs. However, Web pages are not solely linked texts, but have structured layout organizing a large variety of elements in different styles. Such layout itself can convey useful visual information, indicating the relevance of a Web page. For example, the… 

Figures and Tables from this paper

Visual Features for Information Retrieval

It was proposed, that the exploitation of a documents visual information can benefit Learning To Rank (LTR), and this approach would allow to create a model which depends only on visual information, without the use of handcrafted formulas and heuristics.

Relevance Estimation with Multiple Information Sources on Search Engine Result Pages

A novel framework named Joint Relevance Estimation model (JRE), which learns the visual patterns from screenshots of search results, explores the presentation structures from HTML source codes and also adopts the semantic information of textual contents is proposed.

ViTOR: Learning to Rank Webpages Based on Visual Features

The Visual learning TO Rank (ViTOR) model is introduced that integrates state-of-the-art visual features extraction methods, and it significantly improves the performance of LTR with visual features.

Incorporating Vision Bias into Click Models for Image-oriented Search Engine

This paper assumes that vision bias exists in an imageoriented search engine as another crucial factor affecting the examination probability aside from position, and proposes an extended model, which uses regression-based EM algorithm to predict the vision bias given the visual features extracted from candidate documents.

DeepTileBars: Visualizing Term Distribution for Neural Information Retrieval

Although its design and implementation are light-weight, DeepTileBars outperforms other state-of-the-art Neu-IR models on benchmark datasets including the Text REtrieval Conference (TREC) 2010-2012 Web Tracks and LETOR 4.0.

Neural generative models and representation learning for information retrieval

Empirical results show that the proposed neural generative framework can effectively learn information representations and construct retrieval models that outperform the state-of-the-art systems in a variety of IR tasks.

A Deep Look into Neural Ranking Models for Information Retrieval

Search Result Reranking with Visual and Structure Information Sources

Experimental results show that the proposed two models achieve better performance than state-of-the-art ranking solutions as well as the original rankings of commercial search engines.



Learning block importance models for web pages

This paper uses a vision-based page segmentation algorithm to partition a web page into semantic blocks with a hierarchical structure, then spatial features and content features are extracted and used to construct a feature vector for each block.

Quality-biased ranking of web documents

This paper presents the quality-biased ranking method that promotes documents containing high-quality content, and penalizes low-quality documents, and consistently improves the retrieval performance of text-based and link-based retrieval methods that do not take into account the quality of the document content.

The structure of broad topics on the web

It is proposed that a topic taxonomy such as Yahoo! or the Open Directory provides a useful framework for understanding the structure of content-based clusters and communities and measurements may prove valuable in the design of community-specific crawlers and link-based ranking systems.

AdaRank: a boosting algorithm for information retrieval

The proposed novel learning algorithm, referred to as AdaRank, repeatedly constructs 'weak rankers' on the basis of reweighted training data and finally linearly combines the weak rankers for making ranking predictions, which proves that the training process of AdaRank is exactly that of enhancing the performance measure used.

A study of smoothing methods for language models applied to Ad Hoc information retrieval

This paper examines the sensitivity of retrieval performance to the smoothing parameters and compares several popular smoothing methods on different test collections.

Selective Search for Object Recognition

This paper introduces selective search which combines the strength of both an exhaustive search and segmentation, and shows that its selective search enables the use of the powerful Bag-of-Words model for recognition.

LETOR: A benchmark collection for research on learning to rank for information retrieval

The details of the LETOR collection are described and it is shown how it can be used in different kinds of researches, and several state-of-the-art learning to rank algorithms on LETOR are compared.

An Efficient Boosting Algorithm for Combining Preferences

This work describes and analyze an efficient algorithm called RankBoost for combining preferences based on the boosting approach to machine learning, and gives theoretical results describing the algorithm's behavior both on the training data, and on new test data not seen during training.