Web Image Context Extraction with Graph Neural Networks and Sentence Embeddings on the DOM tree

@inproceedings{Dang2021WebIC,
  title={Web Image Context Extraction with Graph Neural Networks and Sentence Embeddings on the DOM tree},
  author={Chen Dang and Hicham Randrianarivo and Rapha{\"e}l Fournier-S’niehotta and N. Audebert},
  booktitle={PKDD/ECML Workshops},
  year={2021}
}
Web Image Context Extraction (WICE) consists in obtaining the textual information describing an image using the content of the surrounding webpage. A common preprocessing step before performing WICE is to render the content of the webpage. When done at a large scale (e.g., for search engine indexation), it may become very computationally costly (up to several seconds per page). To avoid this cost, we introduce a novel WICE approach that combines Graph Neural Networks (GNNs) and Natural Language… 

References

SHOWING 1-10 OF 23 REFERENCES

Web document text and images extraction using DOM analysis and natural language processing

For higher accuracy in content extraction, the analyzing software needs to mimic a human user and understand content in natural language similar to the way humans intuitively do in order to eliminate noisy content.

GraphIE: A Graph-Based Framework for Information Extraction

Evaluation on three different tasks shows that GraphIE consistently outperforms the state-of-the-art sequence tagging model by a significant margin, and generates a richer representation that can be exploited to improve word-level predictions.

Extraction of Web Image Information: Semantic or Visual Cues?

The proposed system uses visual cues in order to cluster a web page into several regions and compares this method to the use of semantic information and the realization of a k-means clustering, revealing the advantages and disadvantages of the different clustering techniques and confirming the validity of the proposed method for web image indexing.

A bootstrapping framework for annotating and retrieving WWW images

This research explores the use of machine learning approaches to automatically annotate WWW images based on a predefined list of concepts by fusing evidences from image contents and their associated HTML text by exploiting different combinations of HTML text and visual representations.

Interpreting and Understanding Graph Convolutional Neural Network using Gradient-based Attribution Methods

A gradient attribution analysis method for the GCN is proposed called Node Attribution Method (NAM), which can get the model contribution from not only the central node but also its neighbor nodes to theGCN output, and the Node Importance Visualization (NIV) method to visualize the central nodes and its neighbors nodes based on the value of the contribution.

Giving meanings to WWW images

Weight ChainNet is a novel image representation model based on lexical chain that represents the semantics of an image from its nearby text that outperform existing technique and can lead to significantly better retrieval effectiveness.

Image retrieval using multiple evidence ranking

This work introduces an image retrieval model based on Bayesian belief networks that indicates that retrieval using an image surrounding text passages is as effective as standard retrieval based on HTML tags.

OpenCeres: When Open Information Extraction Meets the Semi-Structured Web

This paper defines the problem of OpenIE from semi-structured websites to extract such facts, and presents an approach for solving it, and introduces a labeled evaluation dataset to motivate research in this area.

Web image indexing by using associated texts

In order to index Web images, the whole associated texts are partitioned into a sequence of text blocks, then the local relevance of a term to the corresponding image is calculated with respect to

Question Answering by Reasoning Across Documents with Graph Convolutional Networks

A neural model which integrates and reasons relying on information spread within documents and across multiple documents is introduced, which achieves state-of-the-art results on a multi-document question answering dataset, WikiHop.