• Corpus ID: 85459620

Semantic Comparison of State-of-the-Art Deep Learning Methods for Image Multi-Label Classification

  title={Semantic Comparison of State-of-the-Art Deep Learning Methods for Image Multi-Label Classification},
  author={Adam Kubany and Shimon Ben Ishay and Ruben-sacha Ohayon and Armin Shmilovici and Lior Rokach and Tomer Doitshman},
Image understanding relies heavily on accurate multi-label classification. In recent years deep learning (DL) algorithms have become very successful tools for multi-label classification of image objects. With these set of tools, various implementations of DL algorithms have been released for the public use in the form of application programming interfaces (API). In this study, we evaluate and compare 10 of the most prominent publicly available APIs in a best-of-breed challenge. The evaluation… 
Multi-label Ranking: Mining Multi-label and Label Ranking Data
This work survey developments in the last demi-decade, with a special focus on state-of-the-art methods in deep learning multi-label mining, extreme multi- label classification and label ranking, and offers a few future research directions.


CNN-RNN: A Unified Framework for Multi-label Image Classification
The proposed CNN-RNN framework learns a joint image-label embedding to characterize the semantic label dependency as well as the image- label relevance, and it can be trained end-to-end from scratch to integrate both information in a unified framework.
An extensive experimental comparison of methods for multi-label learning
The results of the analysis show that for multi-label classification the best performing methods overall are random forests of predictive clustering trees (RF-PCT) and hierarchy of multi- label classifiers (HOMER), followed by binary relevance (BR) and classifier chains (CC).
A Literature Survey on Algorithms for Multi-label Learning
Multi-label Learning is a form of supervised learning where the classification algorithm is required to learn from a set of instances, each instance can belong to multiple classes and so after be
Learning Deep Latent Space for Multi-Label Classification
A novel deep neural networks based model, Canonical Correlated AutoEncoder (C2AE), is proposed, which allows end-to-end learning and prediction with the ability to exploit label dependency, and can be easily extended to address the learning problem with missing labels.
Deep Residual Learning for Image Recognition
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Discriminative Methods for Multi-labeled Classification
A new technique for combining text features and features indicating relationships between classes, which can be used with any discriminative algorithm is presented, which beat accuracy of existing methods with statistically significant improvements.
ImageNet classification with deep convolutional neural networks
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Deep Visual-Semantic Alignments for Generating Image Descriptions
  • A. Karpathy, Li Fei-Fei
  • Computer Science, Medicine
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2017
A model that generates natural language descriptions of images and their regions based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding is presented.
Show and tell: A neural image caption generator
This paper presents a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image.
Rethinking the Inception Architecture for Computer Vision
This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.