• Publications
  • Influence
Unsupervised Representation Learning by Predicting Image Rotations
TLDR
This work proposes to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input, and demonstrates both qualitatively and quantitatively that this apparently simple task actually provides a very powerful supervisory signal for semantic feature learning.
Dynamic Few-Shot Visual Learning Without Forgetting
TLDR
This work proposes to extend an object recognition system with an attention based few-shot classification weight generator, and to redesign the classifier of a ConvNet model as the cosine similarity function between feature representations and classification weight vectors.
Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model
TLDR
An object detection system that relies on a multi-region deep convolutional neural network that also encodes semantic segmentation-aware features that aims at capturing a diverse set of discriminative appearance factors and exhibits localization sensitivity that is essential for accurate object localization.
PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model
TLDR
The proposed PersonLab model tackles both semantic-level reasoning and object-part associations using part-based modeling, and employs a convolutional network which learns to detect individual keypoints and predict their relative displacements, allowing us to group keypoints into person pose instances.
Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization
TLDR
This work proposes a new approach to tackle the problem of computing category agnostic bounding box proposals based on an active strategy for generating box proposals that starts from a set of seed boxes, which are uniformly distributed on the image, and then progressively moves its attention on the promising image areas where it is more likely to discover well localized boundingbox proposals.
Boosting Few-Shot Visual Learning With Self-Supervision
TLDR
This work uses self-supervision as an auxiliary task in a few-shot learning pipeline, enabling feature extractors to learn richer and more transferable visual representations while still using few annotated samples.
Detect, Replace, Refine: Deep Structured Prediction for Pixel Wise Labeling
TLDR
This work proposes a generic architecture that decomposes the label improvement task to three steps: detecting the initial label estimates that are incorrect, replacing the incorrect labels with new ones, and finally refining the renewed labels by predicting residual corrections w.r.t. them.
Generating Classification Weights With GNN Denoising Autoencoders for Few-Shot Learning
TLDR
This work proposes the use of a Denoising Autoencoder network that takes as input a set of classification weights corrupted with Gaussian noise and learns to reconstruct the target-discriminative classification weights, and proposes to implement the DAE model as a Graph Neural Network (GNN).
LocNet: Improving Localization Accuracy for Object Detection
TLDR
This work proposes a novel object localization methodology that makes use of a convolutional neural network architecture that is properly adapted for this task, called LocNet and shows experimentally that LocNet achieves a very significant improvement on the mAP for high IoU thresholds on PASCAL VOC2007 test set.
Learning Representations by Predicting Bags of Visual Words
TLDR
This work shows that the process of image discretization into visual words can provide the basis for very powerful self-supervised approaches in the image domain, thus allowing further connections to be made to related methods from the NLP domain that have been extremely successful so far.
...
1
2
...