DRG: Dual Relation Graph for Human-Object Interaction Detection

  title={DRG: Dual Relation Graph for Human-Object Interaction Detection},
  author={Chen Gao and Jiarui Xu and Yuliang Zou and Jia-Bin Huang},
We tackle the challenging problem of human-object interaction (HOI) detection. Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features. In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph (one human-centric and one object-centric). Our proposed dual relation… 

A Skeleton-aware Graph Convolutional Network for Human-Object Interaction Detection

A skeleton-aware graph convolutional network for human-object interaction detection, named SGCN4HOI, that outperforms the state-of-the-art pose-based models and achieves competitive performance against other models and a novel skeleton-based object keypoints representation.

GTNet: Guided Transformer Network for Detecting Human-Object Interactions

GTNet encodes this spatial contextual information in human and object visual features via self-attention while achieving a 4%-6% improvement over previous state of the art results on both the V-COCO and HICO-DET datasets.

Detecting Human-Object Relationships in Videos

It is found that applying attention mechanisms among features distributed spatio-temporally greatly improves the understanding of human-object relationships.

Effective Actor-centric Human-object Interaction Detection

Decoupling Object Detection from Human-Object Interaction Recognition

This paper proposes DEFR, a DEtection-FRee method to recognize Human-Object Interactions (HOI) at image level without using object location or human pose, and proposes Log-Sum-Exp Sign (LSE-Sign) loss to facilitate multi-label learning on a long-tailed dataset by balancing gradients over all classes in a softmax format.

Spatio-attentive Graphs for Human-Object Interaction Detection

This work addresses the problem of detecting human--object interactions in images using graphical neural networks by fusing relative spatial information with appearance features within a single graphical model allowing information conditioned on both modalities to influence the prediction of interactions with neighboring nodes.

QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information

A transformer-based feature extractor, in which an attention mechanism and query-based detection play key roles, which successfully extracts contextually important features, and thus outperforms existing methods by large margins.

Distance Matters in Human-Object Interaction Detection

A novel two-stage method for better handling distant interactions in HOI detection with a novel Far Near Distance Attention module that enables information propagation between humans and objects, whereby the spatial distance is skillfully taken into consideration.

Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

This paper proposes to utilize a Verb Semantic Model (VSM) and use semantic aggregation to profit from this object-guided hierarchy and proposes to generate cross-modality-aware visual and semantic features by Cross-Modal Calibration (CMC).

HOTR: End-to-End Human-Object Interaction Detection with Transformers

This paper presents a novel framework, referred by HOTR, which directly predicts a set of 〈human, object, interaction〉 triplets from an image based on a transformer encoder-decoder architecture and achieves the state-of-the-art performance in two HOI detection benchmarks with an inference time under 1 ms after object detection.



Pose-Aware Multi-Level Feature Network for Human Object Interaction Detection

This work develops a multi-branch deep network to learn a pose-augmented relation representation at three semantic levels, incorporating interaction context, object features and detailed semantic part cues, and demonstrates its efficacy in handling complex scenes.

Deep Contextual Attention for Human-Object Interaction Detection

This work proposes a contextual attention framework for human-object interaction detection that leverages context by learning contextually-aware appearance features for human and object instances and adaptively selects relevant instance-centric context information to highlight image regions likely to contain human- object interactions.

Learning to Detect Human-Object Interactions

Experiments demonstrate that the proposed Human-Object Region-based Convolutional Neural Networks (HO-RCNN), by exploiting human-object spatial relations through Interaction Patterns, significantly improves the performance of HOI detection over baseline approaches.

iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection

This paper proposes an instance-centric attention module that learns to dynamically highlight regions in an image conditioned on the appearance of each instance and allows an attention-based network to selectively aggregate features relevant for recognizing HOIs.

Learning Human-Object Interactions by Graph Parsing Neural Networks

This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images and videos. We introduce the Graph Parsing Neural Network (GPNN), a framework that incorporates

Detecting Human-Object Interactions via Functional Generalization

This work presents an approach for detecting human-object interactions (HOIs) in images, based on the idea that humans interact with functionally similar objects in a similar manner, and demonstrates that using a generic object detector, the model can generalize to interactions involving previously unseen objects.

No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques

We show that for human-object interaction detection a relatively simple factorized model with appearance and layout encodings constructed from pre-trained object detectors outperforms more

Scaling Human-Object Interaction Recognition Through Zero-Shot Learning

This work introduces a factorized model for HOI detection that disentangles reasoning on verbs and objects, and at test-time can therefore produce detections for novel verb-object pairs through a zero-shot learning approach.

Detecting and Recognizing Human-Object Interactions

A novel model is proposed that learns to predict an action-specific density over target object locations based on the appearance of a detected person and efficiently infers interaction triplets in a clean, jointly trained end-to-end system the authors call InteractNet.

HCVRD: A Benchmark for Large-Scale Human-Centered Visual Relationship Detection

A webly-supervised approach to these problems is proposed and it is demonstrated that the proposed model provides a strong baseline on the authors' HCVRD dataset.