Pose-Aware Multi-Level Feature Network for Human Object Interaction Detection

@article{Wan2019PoseAwareMF,
  title={Pose-Aware Multi-Level Feature Network for Human Object Interaction Detection},
  author={Bo Wan and Desen Zhou and Yongfei Liu and Rongjie Li and Xuming He},
  journal={2019 IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2019},
  pages={9468-9477}
}
  • Bo Wan, Desen Zhou, +2 authors Xuming He
  • Published 2019
  • Computer Science
  • 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
Reasoning human object interactions is a core problem in human-centric scene understanding and detecting such relations poses a unique challenge to vision systems due to large variations in human-object configurations, multiple co-occurring relation instances and subtle visual difference between relation categories. To address those challenges, we propose a multi-level relation detection strategy that utilizes human pose cues to capture global spatial configurations of relations and as an… Expand

Figures, Tables, and Topics from this paper

Pose-based Modular Network for Human-Object Interaction Detection
TLDR
A Pose-based Modular Network (PMN) is contributed which explores the absolute pose features and relative spatial pose features to improve HOI detection and is fully compatible with existing networks. Expand
Visual-Semantic-Pose Graph Mixture Networks for Human-Object Interaction Detection
TLDR
A dual graph attention network is proposed to dynamically aggregate the visual, instance spatial, and semantic cues from primary subject-object relations as well as subsidiary ones to enhance inference and outperforms state-of-the-art on the challenging HICO-DET dataset. Expand
Human Object Interaction Detection via Multi-level Conditioned Network
TLDR
A novel multi-level conditioned network that fuses extra spatial-semantic knowledge with visual features to enhance the reasoning capablity of CNN is proposed and is superior to the state-of-the-arts. Expand
Detecting human - object interaction with multi-level pairwise feature network
TLDR
It is argued that a paradigm of pairwise feature extraction and action inference can be applied not only at the whole human and object instance level, but also at the part level at which a body part interacts with an object, and at the semantic level by considering the semantic label of an object along with human appearance and human–object spatial configuration to infer the action. Expand
DRG: Dual Relation Graph for Human-Object Interaction Detection
TLDR
The proposed dual relation graph effectively captures discriminative cues from the scene to resolve ambiguity from local predictions and leads to favorable results compared to the state-of-the-art HOI detection algorithms on two large-scale benchmark datasets. Expand
PoSeg: Pose-Aware Refinement Network for Human Instance Segmentation
TLDR
A modular recurrent deep network that utilizes pose estimation to refine instance segmentation in an iterative manner and exceeds the state-of-the-art methods on OCHuman dataset by 3.0 mAP and on COCOPersons by 6.4 mAP, demonstrating the effectiveness of the approach. Expand
GID-Net: Detecting Human-Object Interaction with Global and Instance Dependency
TLDR
The proposed GID-Net outperforms the existing best-performing methods on two public benchmarks, including V-COCO and HICO-DET, validating its efficacy in detecting human-object interactions. Expand
Human-Centric Parsing Network for Human-Object Interaction Detection
  • Guanyu Chen, Chong Chen, Zhicheng Zhao, Fei Su
  • Computer Science
  • 2020 25th International Conference on Pattern Recognition (ICPR)
  • 2021
TLDR
A Human-Centric Parsing Network (HCPN), which integrates global structural knowledge to infer human-object interactions, and a great improvement is achieved compared with state-of-the-art methods. Expand
Learning Human-Object Interaction Detection Using Interaction Points
TLDR
This paper proposes a novel fully-convolutional approach that directly detects the interactions between human-object pairs and predicts interaction points, which directly localize and classify the inter-action. Expand
Cascaded Human-Object Interaction Recognition
TLDR
This work introduces a cascade architecture for a multi-stage, coarse-to-fine HOI understanding, and makes the framework flexible to perform fine-grained pixel-wise relation segmentation; this provides a new glimpse into better relation modeling. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 33 REFERENCES
iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection
TLDR
This paper proposes an instance-centric attention module that learns to dynamically highlight regions in an image conditioned on the appearance of each instance and allows an attention-based network to selectively aggregate features relevant for recognizing HOIs. Expand
Interact as You Intend: Intention-Driven Human-Object Interaction Detection
TLDR
The proposed human intention-driven HOI detection (iHOI) framework models human pose with the relative distances from body joints to the object instances and utilizes human gaze to guide the attended contextual regions in a weakly-supervised setting. Expand
Learning to Detect Human-Object Interactions
TLDR
Experiments demonstrate that the proposed Human-Object Region-based Convolutional Neural Networks (HO-RCNN), by exploiting human-object spatial relations through Interaction Patterns, significantly improves the performance of HOI detection over baseline approaches. Expand
Detecting and Recognizing Human-Object Interactions
TLDR
A novel model is proposed that learns to predict an action-specific density over target object locations based on the appearance of a detected person and efficiently infers interaction triplets in a clean, jointly trained end-to-end system the authors call InteractNet. Expand
Scaling Human-Object Interaction Recognition Through Zero-Shot Learning
TLDR
This work introduces a factorized model for HOI detection that disentangles reasoning on verbs and objects, and at test-time can therefore produce detections for novel verb-object pairs through a zero-shot learning approach. Expand
Visual Translation Embedding Network for Visual Relation Detection
TLDR
This work proposes a novel feature extraction layer that enables object-relation knowledge transfer in a fully-convolutional fashion that supports training and inference in a single forward/backward pass, and proposes the first end-toend relation detection network. Expand
Multi-context Attention for Human Pose Estimation
TLDR
This paper proposes to incorporate convolutional neural networks with a multi-context attention mechanism into an end-to-end framework for human pose estimation and designs novel Hourglass Residual Units (HRUs) to increase the receptive field of the network. Expand
Cascaded Pyramid Network for Multi-person Pose Estimation
TLDR
A novel network structure called Cascaded Pyramid Network (CPN) is presented which targets to relieve the problem from these "hard" keypoints, with state-of-art results on the COCO keypoint benchmark, with average precision at 73.0. Expand
Attentional Pooling for Action Recognition
TLDR
This work introduces a simple yet surprisingly powerful model to incorporate attention in action recognition and human object interaction tasks, and introduces a novel derivation of bottom-up and top-down attention as low-rank approximations of bilinear pooling methods (typically used for fine-grained classification). Expand
Learning Human-Object Interactions by Graph Parsing Neural Networks
This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images and videos. We introduce the Graph Parsing Neural Network (GPNN), a framework that incorporatesExpand
...
1
2
3
4
...