Probabilistic Graph Attention Network With Conditional Kernels for Pixel-Wise Prediction

  title={Probabilistic Graph Attention Network With Conditional Kernels for Pixel-Wise Prediction},
  author={Dan Xu and Xavier Alameda-Pineda and Wanli Ouyang and Elisa Ricci and Xiaogang Wang and N. Sebe},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
Multi-scale representations deeply learned via convolutional neural networks have shown tremendous importance for various pixel-level prediction problems. In this paper we present a novel approach that advances the state of the art on pixel-level prediction in a fundamental aspect, i.e. structured multi-scale features learning and fusion. In contrast to previous works directly considering multi-scale feature maps obtained from the inner layers of a primary CNN architecture, and simply fusing… 

Variational Structured Attention Networks for Deep Visual Representation Learning

VISTA-Net outperforms the state-of-the-art in multiple continuous and discrete prediction tasks, thus confirming the benefit of the proposed approach in joint structured spatial-channel attention estimation for deep representation learning.

Transformer-Based Attention Networks for Continuous Pixel-Wise Prediction

This paper proposes TransDepth, an architecture that benefits from both convolutional neural networks and transformers that applies transformers to pixel-wise prediction problems involving continuous labels and achieves state-of-theart performance on three challenging datasets.

Transformers Solve the Limited Receptive Field for Monocular Depth Prediction

This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels (i.e., monocular depth prediction and surface normal estimation) and achieves state-of-the-art performance on three challenging datasets.

CORNet: Context-Based Ordinal Regression Network for Monocular Depth Estimation

Experiments and results on two challenging datasets, KITTI and NYU Depth V2, demonstrate that the proposed CORNet can estimate monocular depth maps effectively and obtain superior performance in capturing geometric features over existing methods.

MANet: Multi-Scale Aware-Relation Network for Semantic Segmentation in Aerial Scenes

A novel multi-scale aware-relation network (MANet) that can learn MS features by collaboratively exploiting the correlation among different scales is proposed to tackle the problem in remote sensing.

DepthFormer: Exploiting Long-Range Correlation and Local Information for Accurate Monocular Depth Estimation

The proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins and achieves the most competitive result on the highly competitive KITTI depth estimation benchmark.

Monocular Depth Estimation Primed by Salient Point Detection and Normalized Hessian Loss

This work proposes an accurate and lightweight framework for monocular depth estimation based on a self-attention mechanism stemming from salient point detection that achieves state-of-the-art results on NYU-Depth-v2 and KITTI while using 3.1-38.4 times smaller model in terms of the number of parameters than baseline approaches.

BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation

A novel framework called BinsFormer, tailored for the classification-regression-based depth estimation, which can adaptively generate bins and per-pixel probability distribution for accurate depth estimation and proposes an auxiliary scene understanding task and a multi-scale prediction refinement strategy that can be seamlessly integrated into the Transformer.

Zoomer: Boosting Retrieval on Web-scale Graphs by Regions of Interest

This work introduces Zoomer, a system deployed at Taobao, the largest e-commerce platform in China, for training and serving GNN-based recommendations over web-scale graphs that achieves up to 14x speedup when downsizing sampling scales with comparable (even better) AUC performance than baseline methods.

Variational Inference and Learning of Piecewise Linear Dynamical Systems

This article proposes a variational approximation of piecewise linear dynamical systems, and provides full details of the derivation of two variational expectation-maximization algorithms: a filter and a smoother.



Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction

A hierarchical deep model is introduced which produces more rich and complementary representations and the novel Attention-Gated Conditional Random Fields (AG-CRFs) are proposed to refine and robustly fuse the representations learned at different scales.

Attention to Scale: Scale-Aware Semantic Image Segmentation

An attention mechanism that learns to softly weight the multi-scale features at each pixel location is proposed, which not only outperforms averageand max-pooling, but allows us to diagnostically visualize the importance of features at different positions and scales.

Conditional Random Fields as Recurrent Neural Networks

A new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs)-based probabilistic graphical modelling is introduced, and top results are obtained on the challenging Pascal VOC 2012 segmentation benchmark.

Higher Order Conditional Random Fields in Deep Neural Networks

Two types of higher order potentials, based on object detections and superpixels, can be included in a CRF embedded within a deep network to allow inference with the differentiable mean field algorithm and are designed to achieve state-of-the-art segmentation performance on the PASCAL VOC benchmark.

Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation

This paper introduces a novel approach for monocular depth estimation which is competitive with previous methods on the KITTI benchmark and outperforms the state of the art on the NYU Depth V2 dataset.

Top-Down Learning for Structured Labeling with Convolutional Pseudoprior

A new method for structured labeling by developing convolutional pseudo-prior (ConvPP) on the ground-truth labels using pseudo-likelihood approximation to the prior under a novel fixed-point network structure that facilitates an end-to-end learning process.

Structured Modeling of Joint Deep Feature and Prediction Refinement for Salient Object Detection

This paper designs a novel cascade CRFs architecture with CNN to jointly refine deep features and predictions at each scale and progressively compute a final refined saliency map, and formulate the CRF graphical model that involves message-passing of feature-feature, feature-prediction, and prediction-predicted, from the coarse scale to the finer scale.

Fully convolutional networks for semantic segmentation

The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

Towards unified depth and semantic prediction from a single image

This work proposes a unified framework for joint depth and semantic prediction that effectively leverages the advantages of both tasks and provides the state-of-the-art results.

The application of two-level attention models in deep convolutional neural network for fine-grained image classification

This paper proposes to apply visual attention to fine-grained classification task using deep neural network and achieves the best accuracy under the weakest supervision condition, and is competitive against other methods that rely on additional annotations.