Pixel-Level Matching for Video Object Segmentation Using Convolutional Neural Networks

@article{Yoon2017PixelLevelMF,
  title={Pixel-Level Matching for Video Object Segmentation Using Convolutional Neural Networks},
  author={Jae Shin Yoon and François Rameau and Junsik Kim and Seokju Lee and Seunghak Shin and In-So Kweon},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  year={2017},
  pages={2186-2195}
}
We propose a novel video object segmentation algorithm based on pixel-level matching using Convolutional Neural Networks (CNN). Our network aims to distinguish the target area from the background on the basis of the pixel-level similarity between two object units. The proposed network represents a target object using features from different depth layers in order to take advantage of both the spatial details and the category-level semantic information. Furthermore, we propose a feature… 

Figures and Tables from this paper

Fast Video Object Segmentation via Mask Transfer Network
TLDR
A novel mask transfer network (MTN) is proposed, which can greatly boost the processing speed of VOS and also achieve a reasonable accuracy, and also shows a competitive accuracy in comparison to the state-of-the-art methods.
Fast Video Object Segmentation via Dynamic Targeting Network
TLDR
Experimental results on two public datasets demonstrate that the proposed model significantly outperforms existing methods without online training in both accuracy and efficiency, and is comparable to online training-based methods in accuracy with an order of magnitude faster speed.
Efficient Video Object Segmentation via Network Modulation
TLDR
This work proposes a novel approach that uses a single forward pass to adapt the segmentation model to the appearance of a specific object and is 70× faster than fine-tuning approaches and achieves similar accuracy.
Video Object Segmentation with 3D Convolution Network
TLDR
This work explores a novel method to realize semi-supervised video object segmentation with special spatiotemporal feature extracting structure and shows better performance than most methods proposed in recent years and its meanIOU accuracy is comparable with state-of-art methods.
A temporal attention based appearance model for video object segmentation
TLDR
A novel neural network is proposed that integrates a temporal attention based appearance model and a boundary-aware loss that assists the proposed method to learn a discriminative and robust target representation and avoid the drift problem of traditional propagation schemes.
VideoMatch: Matching based Video Object Segmentation
TLDR
This work develops a novel matching based algorithm for video object segmentation that learns to match extracted features to a provided template without memorizing the appearance of the objects.
Kernelized Memory Network for Video Object Segmentation
TLDR
A kernelized memory network (KMN) is proposed that surpasses the state-of-the-art on standard benchmarks by a significant margin and uses the Hide-and-Seek strategy in pre-training to obtain the best possible results in handling occlusions and segment boundary extraction.
Cascaded ConvLSTMs Using Semantically-Coherent Data Synthesis for Video Object Segmentation
TLDR
This paper uses a more effective and efficient cascade module to refine the model predictions and proposes a semantically-coherent data synthesis strategy to augment training sequences without any efforts.
Semi-supervised Video Object Segmentation with Recurrent Neural Network
TLDR
Semi-supervised Video Object Segmentation with Recurrent Neural Network (SVOSR) has been proposed which combines convolutional gated recurrent unit (ConvGRU) to learn the temporal information between adjacent frames.
Fast Pixel-Matching for Video Object Segmentation
...
...

References

SHOWING 1-10 OF 36 REFERENCES
Learning Video Object Segmentation from Static Images
TLDR
It is demonstrated that highly accurate object segmentation in videos can be enabled by using a convolutional neural network (convnet) trained with static images only, and a combination of offline and online learning strategies are used.
One-Shot Video Object Segmentation
TLDR
One-Shot Video Object Segmentation (OSVOS), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one-shot).
Visual Tracking with Fully Convolutional Networks
TLDR
An in-depth study on the properties of CNN features offline pre-trained on massive image data and classification task on ImageNet shows that the proposed tacker outperforms the state-of-the-art significantly.
Fully Connected Object Proposals for Video Segmentation
TLDR
A novel approach to video segmentation using multiple object proposals that combines appearance with long-range point tracks to ensure robustness with respect to fast motion and occlusions over longer video sequences is presented.
Hierarchical Convolutional Features for Visual Tracking
TLDR
This paper adaptively learn correlation filters on each convolutional layer to encode the target appearance and hierarchically infer the maximum response of each layer to locate targets.
Key-segments for video object segmentation
TLDR
The method first identifies object-like regions in any frame according to both static and dynamic cues and compute a series of binary partitions among candidate “key-segments” to discover hypothesis groups with persistent appearance and motion.
Learning Multi-domain Convolutional Neural Networks for Visual Tracking
TLDR
A novel visual tracking algorithm based on the representations from a discriminatively trained Convolutional Neural Network using a large set of videos with tracking ground-truths to obtain a generic target representation.
Transferring Rich Feature Hierarchies for Robust Visual Tracking
TLDR
This work pre-training a CNN offline and then transferring the rich feature hierarchies learned to online tracking, and proposes to generate a probability map instead of producing a simple class label to fit the characteristics of object tracking.
A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation
TLDR
This work presents a new benchmark dataset and evaluation methodology for the area of video object segmentation, named DAVIS (Densely Annotated VIdeo Segmentation), and provides a comprehensive analysis of several state-of-the-art segmentation approaches using three complementary metrics.
Fully Convolutional Networks for Semantic Segmentation
TLDR
It is shown that convolutional networks by themselves, trained end- to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation.
...
...