• Corpus ID: 195874348

Semantic Comparison of State-of-the-Art Deep Learning APIs for Image Multi-Label Classification

  title={Semantic Comparison of State-of-the-Art Deep Learning APIs for Image Multi-Label Classification},
  author={Adam Kubany and Shimon Ben Ishay and Ruben-sacha Ohayon and Armin Shmilovici and Lior Rokach and Tomer Doitshman},
Image understanding relies heavily on accurate multilabel classification. In recent years, deep learning (DL) algorithms have become very successful tools for multi-label classification of image objects, and various implementations of DL algorithms have been released for public use in the form of application programming interfaces (APIs). In this study, we evaluate and compare 10 of the most prominent publicly available APIs in a best-of-breed challenge. The evaluation is performed on the… 

Figures and Tables from this paper

"A picture is worth a thousand words"? - From Project Inception to First Results: Describing Cross-disciplinary Collaboration in the Digital Humanities Project ChIA
The project aims at analysing the contents of the images with a combination of computer vision, natural language processing and manual curation to represent them with a more descriptive and representative controlled vocabulary.


CNN-RNN: A Unified Framework for Multi-label Image Classification
The proposed CNN-RNN framework learns a joint image-label embedding to characterize the semantic label dependency as well as the image- label relevance, and it can be trained end-to-end from scratch to integrate both information in a unified framework.
Learning Deep Latent Space for Multi-Label Classification
A novel deep neural networks based model, Canonical Correlated AutoEncoder (C2AE), is proposed, which allows end-to-end learning and prediction with the ability to exploit label dependency, and can be easily extended to address the learning problem with missing labels.
A Literature Survey on Algorithms for Multi-label Learning
Multi-label Learning is a form of supervised learning where the classification algorithm is required to learn from a set of instances, each instance can belong to multiple classes and so after be
Deep Residual Learning for Image Recognition
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Deep Visual-Semantic Alignments for Generating Image Descriptions
  • A. Karpathy, Li Fei-Fei
  • Computer Science, Medicine
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2017
A model that generates natural language descriptions of images and their regions based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding is presented.
ImageNet classification with deep convolutional neural networks
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Discriminative Methods for Multi-labeled Classification
A new technique for combining text features and features indicating relationships between classes, which can be used with any discriminative algorithm is presented, which beat accuracy of existing methods with statistically significant improvements.
Show and tell: A neural image caption generator
This paper presents a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image.
Caffe: Convolutional Architecture for Fast Feature Embedding
Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.