A Task-Oriented Approach for Cost-Sensitive Recognition

  title={A Task-Oriented Approach for Cost-Sensitive Recognition},
  author={R. Mottaghi and Hannaneh Hajishirzi and Ali Farhadi},
  journal={2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
With the recent progress in visual recognition, we have already started to see a surge of vision related real-world applications. These applications, unlike general scene understanding, are task oriented and require specific information from visual data. Considering the current growth in new sensory devices, feature designs, feature learning methods, and algorithms, the search in the space of features and models becomes combinatorial. In this paper, we propose a novel cost-sensitive task… Expand
Who Let the Dogs Out? Modeling Dog Behavior from Visual Data
This model takes visual information as input and directly predicts the actions of the agent, and the representation learned by the model encodes distinct information compared to representations trained on image classification, and this learned representation can generalize to other domains. Expand
Contrasting Contrastive Self-Supervised Representation Learning Pipelines
In the past few years, we have witnessed remarkable breakthroughs in self-supervised representation learning. Despite the success and adoption of representations learned through this paradigm, muchExpand
Contrasting Contrastive Self-Supervised Representation Learning Models
This paper analyzes contrastive approaches as one of the most successful and popular variants of self-supervised representation learning and examines over 700 training experiments including 30 encoders, 4 pre-training datasets and 20 diverse downstream tasks. Expand
What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions
Experiments show that the self-supervised representation that encodes interaction and attention cues outperforms a visual-only state-of-the-art method MoCo on a variety of target tasks: scene classification (semantic), action recognition (temporal), depth estimation (geometric), dynamics prediction (physics) and walkable surface estimation (affordance). Expand


Anytime Recognition of Objects and Scenes
A method for learning dynamic policies to optimize Anytime performance in visual architectures and can incorporate a semantic back-off strategy that gives maximally specific predictions for a desired level of accuracy, which provides a new view on the time course of human visual perception. Expand
Modeling the influence of task on attention
The model's performance on search for single features and feature conjunctions is consistent with existing psychophysical data, and results suggest that the model may provide a reasonable approximation to many brain processes involved in complex task-driven visual behaviors. Expand
Probabilistic learning of task-specific visual attention
This work proposes a unified Bayesian approach for modeling task-driven visual attention, and shows that it is able to predict human attention and gaze better than the state-of-the-art, with a large margin. Expand
DeViSE: A Deep Visual-Semantic Embedding Model
This paper presents a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text and shows that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training. Expand
Learning to detect unseen object classes by between-class attribute transfer
The experiments show that by using an attribute layer it is indeed possible to build a learning object detection system that does not require any training images of the target classes, and assembled a new large-scale dataset, “Animals with Attributes”, of over 30,000 animal images that match the 50 classes in Osherson's classic table of how strongly humans associate 85 semantic attributes with animal classes. Expand
Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions
An approach for zero-shot learning of object categories where the description of unseen categories comes in the form of typical text such as an encyclopedia entry, without the need to explicitly defined attributes is proposed. Expand
Dense Semantic Image Segmentation with Objects and Attributes
This paper forms the problem of joint visual attribute and object class image segmentation as a dense multi-labelling problem, where each pixel in an image can be associated with both an object-class and a set of visual attributes labels, and develops a hierarchical model to incorporate region-level object and attribute information. Expand
Zero-Shot Learning Through Cross-Modal Transfer
This work introduces a model that can recognize objects in images even if no training data is available for the object class, and uses novelty detection methods to differentiate unseen classes from seen classes. Expand
Decorrelating Semantic Visual Attributes by Resisting the Urge to Share
It is shown that accounting for structure in the visual attribute space is key to learning attribute models that preserve semantics, yielding improved generalizability that helps in the recognition and discovery of unseen object categories. Expand
Efficient Match Kernel between Sets of Features for Visual Recognition
It is shown that bag-of-words representations commonly used in conjunction with linear classifiers can be viewed as special match kernels, which count 1 if two local features fall into the same regions partitioned by visual words and 0 otherwise. Expand