Automatic Spatially-Aware Fashion Concept Discovery

  title={Automatic Spatially-Aware Fashion Concept Discovery},
  author={Xintong Han and Zuxuan Wu and Phoenix X. Huang and Xiao Zhang and Menglong Zhu and Yuan Li and Yang Zhao and Larry S. Davis},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  • Xintong Han, Zuxuan Wu, L. Davis
  • Published 3 August 2017
  • Computer Science
  • 2017 IEEE International Conference on Computer Vision (ICCV)
This paper proposes an automatic spatially-aware concept discovery approach using weakly labeled image-text data from shopping websites. [] Key Method Then, for each attribute (word), we generate its spatiallyaware representation by combining its semantic word vector representation with its spatial representation derived from the convolutional maps of the fine-tuned network.

Figures from this paper

Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset
This work proposes a novel Attribute-Mask RCNN model to jointly perform instance segmentation and localized attribute recognition, and provides a novel evaluation metric for the task.
Learning Joint Visual Semantic Matching Embeddings for Language-Guided Retrieval
A unified Joint Visual Semantic Matching model that learns image-text compositional embeddings by jointly associating visual and textual modalities in a shared discriminative embedding space via compositional losses is proposed.
Efficient Multi-attribute Similarity Learning Towards Attribute-Based Fashion Search
In this paper, we propose an attribute-based query & retrieval system designed for fashion products. Our system addresses the problem of carrying out fashion searches by the query image and attribute
Learning Attribute Representations with Localization for Flexible Fashion Search
The FashionSearchNet is proposed, which uses a weakly supervised localization method to extract regions of attributes and can be ignored thus improving the similarity learning and outperforms the most recent fashion search techniques.
Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid
A novel Graph Reasoning Network (GRNet) on a Similarity Pyramid, which learns similarities between a query and a gallery cloth by using both global and local representations in multiple scales, and achieves considerable improvements on all empirical settings.
Interpretable Multimodal Retrieval for Fashion Products
Deep learning methods have been successfully applied to fashion retrieval. However, the latent meaning of learned feature vectors hinders the explanation of retrieval results and integration of user
FashionSearchNet-v2: Learning Attribute Representations with Localization for Image Retrieval with Attribute Manipulation
The proposed FashionSearchNet-v2 architecture is able to manipulate the desired attributes of the query image while maintaining its other attributes and outperforms the other state-of-the-art attribute manipulation techniques.
SAC: Semantic Attention Composition for Text-Conditioned Image Retrieval
This work proposes a novel framework SAC that outperforms existing techniques by achieving state-of-the-art performance on 3 benchmark datasets: FashionIQ, Shoes, and Birds-to-Words, while supporting natural language feedback of varying lengths.
Fine-Grained Fashion Similarity Learning by Attribute-Specific Embedding Network
An Attribute-Specific Embedding Network (ASEN) is proposed to jointly learn multiple attribute-specific embeddings in an end-to-end manner, thus measure the fine-grained similarity in the corresponding space.
Cooperative Embeddings for Instance, Attribute and Category Retrieval
Experiments on image retrieval tasks show the benefits of the cooperative embeddings for modeling multiple image similarities, and for discovering style evolution of instances between- and within-categories.


Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network
This work proposes a Dual Attribute-aware Ranking Network (DARN) for retrieval feature learning, consisting of two sub-networks, one for each domain, whose retrieval feature representations are driven by semantic attribute learning.
Discovering localized attributes for fine-grained recognition
This approach uses a latent conditional random field model to discover candidate attributes that are detectable and discriminative, and then employs a recommender system that selects attributes likely to be semantically meaningful from image datasets annotated only with fine-grained category labels and object bounding boxes.
Discovering the Spatial Extent of Relative Attributes
A weakly-supervised approach that discovers the spatial extent of relative attributes, given only pairs of ordered images, by developing a novel formulation that combines a detector with local smoothness to discover a set of coherent visual chains across the image collection.
Learning Fashion Compatibility with Bidirectional LSTMs
This paper proposes to jointly learn a visual-semantic embedding and the compatibility relationships among fashion items in an end-to-end fashion and trains a bidirectional LSTM (Bi-LSTM) model to sequentially predict the next item conditioned on previous ones to learn their compatibility relationships.
Selecting Relevant Web Trained Concepts for Automated Event Retrieval
This work proposes an event retrieval algorithm that constructs pairs of automatically discovered concepts and then prunes those concepts that are unlikely to be helpful for retrieval, and demonstrates large improvements over other vision based systems on the TRECVID MED 13 dataset.
Automatic Attribute Discovery with Neural Activations
An automatic approach to discover and analyze visual attributes from a noisy collection of image-text data on the Web based on the relationship between attributes and neural activations in the deep network is proposed.
DeViSE: A Deep Visual-Semantic Embedding Model
This paper presents a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text and shows that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training.
WhittleSearch: Image search with relative attribute feedback
A novel mode of feedback for image search, where a user describes which properties of exemplar images should be adjusted in order to more closely match his/her mental model of the image(s) sought, which outperforms traditional binary relevance feedback in terms of search speed and accuracy.
Automatic Concept Discovery from Parallel Text and Visual Corpora
An automatic visual concept discovery algorithm is proposed using parallel text and visual corpora, it filters text terms based on the visual discriminative power of the associated images, and groups them into concepts using visual and semantic similarities, and achieves the state-of-the-art performance in the retrieval task.
Automatic Attribute Discovery and Characterization from Noisy Web Data
This work focuses on discovering attributes and their visual appearance, and is as agnostic as possible about the textual description, and characterizes attributes according to their visual representation: global or local, and type: color, texture, or shape.