Spatially Conditioned Graphs for Detecting Human–Object Interactions

  title={Spatially Conditioned Graphs for Detecting Human–Object Interactions},
  author={Frederic Z. Zhang and Dylan Campbell and Stephen Gould},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
We address the problem of detecting human–object interactions in images using graphical neural networks. Unlike conventional methods, where nodes send scaled but otherwise identical messages to each of their neighbours, we propose to condition messages between pairs of nodes on their spatial relationships, resulting in different messages going to neighbours of the same node. To this end, we explore various ways of applying spatial conditioning under a multi-branch structure. Through extensive… 

Distance Matters in Human-Object Interaction Detection

A novel two-stage method for better handling distant interactions in HOI detection is proposed, which surpasses existing methods significantly and leads to new state-of-the-art results.

Decoupling Object Detection from Human-Object Interaction Recognition

This paper proposes DEFR, a DEtection-FRee method to recognize Human-Object Interactions (HOI) at image level without using object location or human pose, and proposes Log-Sum-Exp Sign (LSE-Sign) loss to facilitate multi-label learning on a long-tailed dataset by balancing gradients over all classes in a softmax format.

Interactiveness Field in Human-Object Interactions

This work introduces a previously overlooked interactiveness bimodal prior, and proposes new energy constraints based on the cardinality and difference in the inherent “interactiveness field” underlying interactive versus non-interactive pairs that can detect more precise pairs and significantly boost HOI detection performance.

Spatial-Net for Human-Object Interaction Detection

The proposed Spatial-Net outperforms many state-of-the-art HOI models with less inference time and uses the Hungarian matching technique to assign human-object pairs for each action and human-centric model to reject the non-interaction human- object pairs according to semantic co-occurrence between human and object.

QAHOI: Query-Based Anchors for Human-Object Interaction Detection

A transformer-based method, QAHOI (Query-Based Anchors for Human-Object Interaction detection), which leverages a multi-scale architecture to extract features from different spatial scales and uses query-based anchors to predict all the elements of an HOI instance.

Chairs Can be Stood on: Overcoming Object Bias in Human-Object Interaction Detection

A novel plug-and-play Object-wise Debiasing Memory (ODM) method for re-balancing the distribution of interactions under detected objects, which allows rare interaction instances to be more frequently sampled for training, thereby alleviating the object bias induced by the unbalanced interaction distribution.

Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows

—This paper presents a new vision Transformer, named Iwin Transformer, which is specifically designed for human-object interaction (HOI) detection, a detailed scene understanding task involving a

Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer

The Unary-Pairwise Transformer is proposed, a two-stage detector that exploits unary and pairwise representations for HOIs that significantly outperform state-of-the-art approaches.

What to look at and where: Semantic and Spatial Refined Transformer for detecting human-object interactions

We propose a novel one-stage Transformer-based semantic and spatial refined transformer (SSRT) to solve the Human-Object Interaction detection task, which requires to localize humans and objects, and

The Overlooked Classifier in Human-Object Interaction Recognition

This paper encodes the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs, and proposes a new loss named LSE-Sign to enhance multi-label learning on a longtailed dataset.



Learning Human-Object Interactions by Graph Parsing Neural Networks

This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images and videos. We introduce the Graph Parsing Neural Network (GPNN), a framework that incorporates

VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions

The proposed Visual-Spatial-Graph Network (VSGNet) architecture extracts visual features from the human-object pairs, refines the features with spatial configurations of the pair, and utilizes the structural connections between the pair via graph convolutions.

Detecting Human-Object Interactions via Functional Generalization

This work presents an approach for detecting human-object interactions (HOIs) in images, based on the idea that humans interact with functionally similar objects in a similar manner, and demonstrates that using a generic object detector, the model can generalize to interactions involving previously unseen objects.

iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection

This paper proposes an instance-centric attention module that learns to dynamically highlight regions in an image conditioned on the appearance of each instance and allows an attention-based network to selectively aggregate features relevant for recognizing HOIs.

Learning to Detect Human-Object Interactions

Experiments demonstrate that the proposed Human-Object Region-based Convolutional Neural Networks (HO-RCNN), by exploiting human-object spatial relations through Interaction Patterns, significantly improves the performance of HOI detection over baseline approaches.

Detecting and Recognizing Human-Object Interactions

A novel model is proposed that learns to predict an action-specific density over target object locations based on the appearance of a detected person and efficiently infers interaction triplets in a clean, jointly trained end-to-end system the authors call InteractNet.

DRG: Dual Relation Graph for Human-Object Interaction Detection

The proposed dual relation graph effectively captures discriminative cues from the scene to resolve ambiguity from local predictions and leads to favorable results compared to the state-of-the-art HOI detection algorithms on two large-scale benchmark datasets.

Visual Relationship Detection with Language Priors

This work proposes a model that can scale to predict thousands of types of relationships from a few examples and improves on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship.

Pose-Aware Multi-Level Feature Network for Human Object Interaction Detection

This work develops a multi-branch deep network to learn a pose-augmented relation representation at three semantic levels, incorporating interaction context, object features and detailed semantic part cues, and demonstrates its efficacy in handling complex scenes.

Relation Parsing Neural Network for Human-Object Interaction Detection

  • Pengcheng ZhouM. Chi
  • Computer Science
    2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2019
Experiments conducted on V-COCO and HICO-DET datasets confirm the effectiveness of the proposed RPNN network which significantly outperforms state-of-the-art methods.