Pixel-Level Matching for Video Object Segmentation Using Convolutional Neural Networks

  title={Pixel-Level Matching for Video Object Segmentation Using Convolutional Neural Networks},
  author={Jae Shin Yoon and François Rameau and Junsik Kim and Seokju Lee and Seunghak Shin and In-So Kweon},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
We propose a novel video object segmentation algorithm based on pixel-level matching using Convolutional Neural Networks (CNN). Our network aims to distinguish the target area from the background on the basis of the pixel-level similarity between two object units. The proposed network represents a target object using features from different depth layers in order to take advantage of both the spatial details and the category-level semantic information. Furthermore, we propose a feature… 

Figures and Tables from this paper

Efficient Video Object Segmentation via Network Modulation
This work proposes a novel approach that uses a single forward pass to adapt the segmentation model to the appearance of a specific object and is 70× faster than fine-tuning approaches and achieves similar accuracy.
Video Object Segmentation with 3D Convolution Network
This work explores a novel method to realize semi-supervised video object segmentation with special spatiotemporal feature extracting structure and shows better performance than most methods proposed in recent years and its meanIOU accuracy is comparable with state-of-art methods.
A temporal attention based appearance model for video object segmentation
A novel neural network is proposed that integrates a temporal attention based appearance model and a boundary-aware loss that assists the proposed method to learn a discriminative and robust target representation and avoid the drift problem of traditional propagation schemes.
VideoMatch: Matching based Video Object Segmentation
This work develops a novel matching based algorithm for video object segmentation that learns to match extracted features to a provided template without memorizing the appearance of the objects.
Attention-based video object segmentation algorithm
To improve the segmentation performance on videos with large object motion or deformation, a novel scheme is proposed which has two branches. In one branch, the attention mechanism is first utilized
Kernelized Memory Network for Video Object Segmentation
A kernelized memory network (KMN) is proposed that surpasses the state-of-the-art on standard benchmarks by a significant margin and uses the Hide-and-Seek strategy in pre-training to obtain the best possible results in handling occlusions and segment boundary extraction.
Cascaded ConvLSTMs Using Semantically-Coherent Data Synthesis for Video Object Segmentation
This paper uses a more effective and efficient cascade module to refine the model predictions and proposes a semantically-coherent data synthesis strategy to augment training sequences without any efforts.
Semi-supervised Video Object Segmentation with Recurrent Neural Network
Semi-supervised Video Object Segmentation with Recurrent Neural Network (SVOSR) has been proposed which combines convolutional gated recurrent unit (ConvGRU) to learn the temporal information between adjacent frames.
Video Object Segmentation with Dynamic Memory Networks and Adaptive Object Alignment
This paper proposes a novel solution for object-matching based semi-supervised video object segmentation, where the target object masks in the first frame are provided and an adaptive object alignment module by incorporating a region translation function that aligns object proposals towards templates in the feature space.
One-Shot Video Object Segmentation Using Attention Transfer
  • Omit ChandaYang Wang
  • Computer Science
    2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP)
  • 2019
This work proposes an attention based knowledge transfer mechanism that transfers the object knowledge from the first frame to other frames in a video using a Siamese network with two streams.


Learning Video Object Segmentation from Static Images
It is demonstrated that highly accurate object segmentation in videos can be enabled by using a convolutional neural network (convnet) trained with static images only, and a combination of offline and online learning strategies are used.
Visual Tracking with Fully Convolutional Networks
An in-depth study on the properties of CNN features offline pre-trained on massive image data and classification task on ImageNet shows that the proposed tacker outperforms the state-of-the-art significantly.
Fully Connected Object Proposals for Video Segmentation
A novel approach to video segmentation using multiple object proposals that combines appearance with long-range point tracks to ensure robustness with respect to fast motion and occlusions over longer video sequences is presented.
Hierarchical Convolutional Features for Visual Tracking
This paper adaptively learn correlation filters on each convolutional layer to encode the target appearance and hierarchically infer the maximum response of each layer to locate targets.
Key-segments for video object segmentation
The method first identifies object-like regions in any frame according to both static and dynamic cues and compute a series of binary partitions among candidate “key-segments” to discover hypothesis groups with persistent appearance and motion.
Learning Multi-domain Convolutional Neural Networks for Visual Tracking
A novel visual tracking algorithm based on the representations from a discriminatively trained Convolutional Neural Network using a large set of videos with tracking ground-truths to obtain a generic target representation.
A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation
This work presents a new benchmark dataset and evaluation methodology for the area of video object segmentation, named DAVIS (Densely Annotated VIdeo Segmentation), and provides a comprehensive analysis of several state-of-the-art segmentation approaches using three complementary metrics.
Fully Convolutional Networks for Semantic Segmentation
It is shown that convolutional networks by themselves, trained end- to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation.
Very Deep Convolutional Networks for Large-Scale Image Recognition
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Interactive video segmentation using occlusion boundaries and temporally coherent superpixels
An interactive video segmentation system built on the basis of occlusion and long term spatio-temporal structure cues that allows extracting multiple foreground objects together, saving user time if more than one object is present.