Caffe: Convolutional Architecture for Fast Feature Embedding

  title={Caffe: Convolutional Architecture for Fast Feature Embedding},
  author={Yangqing Jia and Evan Shelhamer and Jeff Donahue and Sergey Karayev and Jonathan Long and Ross B. Girshick and Sergio Guadarrama and Trevor Darrell},
  journal={Proceedings of the 22nd ACM international conference on Multimedia},
Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a… 

Figures and Tables from this paper

Acceleration of image classification with Caffe framework using FPGA

The results showed that the mapping of Caffe on the FPGA-based Zynq takes advantage of the low-power, customizable and programmable fabric and ultimately reduces time and power consumption of image classification.

clCaffe: OpenCL Accelerated Caffe for Convolutional Neural Networks

This work presents OpenCL acceleration of a well-known deep learning framework, Caffe, while focusing on the convolution layer which has been optimized with three different approaches, GEMM, spatial domain, and frequency domain, which greatly enhances the ability to leverage deep learning use cases on all types of OpenCL devices.

FCLNN: A Flexible Framework for Fast CNN Prototyping on FPGA with OpenCL and Caffe

  • Xianchao XuBrian Liu
  • Computer Science
    2018 International Conference on Field-Programmable Technology (FPT)
  • 2018
This paper proposes a flexible HW/SW co-design framework for both fast and high-throughput CNN prototyping with commercial high-level OpenCL language and the standard open-source deep learning framework Caffe and builds up a parameterizable stream-architected convolution engine.


Experimental results demonstrate that NUMA-Caffe significantly outperforms the state-of-the-art Caffe designs in terms of both throughput and scalability.

S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters

S-Caffe; a scalable and distributed Caffe adaptation for modern multi-GPU clusters is proposed; a co-design of the Caffe framework and the MVAPICH2-GDR MPI runtime that scales up to 160 GPUs.

Using Supercomputer to Speed up Neural Network Training

  • Yue YuJinrong JiangX. Chi
  • Computer Science
    2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)
  • 2016
This paper has developed a framework based on Caffe called Caffe-HPC that can utilize computing clusters with multiple GPUs to train large models and makes it possible to train larger networks on larger training sets in a reasonable amount of time.

Efficient Multi-training Framework of Image Deep Learning on GPU Cluster

A framework to organize the training procedures of multiple deep learning models into a pipeline on a GPU cluster, where each stage is handled by a particular GPU with a partition of the training dataset is proposed.

Doubly Convolutional Neural Networks

This paper proposes doubly convolutional neural networks (DCNNs), which significantly improve the performance of CNNs by further exploring this idea and shows that DCNN can serve the dual purpose of building more accurate models and/or reducing the memory footprint without sacrificing the accuracy.

Accelerating Deep Learning Frameworks with Micro-Batches

cuDNN is a low-level library that provides GPU kernels frequently used in deep learning. Specifically, cuDNN implements several equivalent convolution algorithms, whose performance and memory

TensorLayer: A Versatile Library for Efficient Deep Learning Development

TensorLayer is a Python-based versatile deep learning library that provides high-level modules that abstract sophisticated operations towards neuron layers, network models, training data and dependent training jobs and has transparent module interfaces that allows developers to flexibly embed low-level controls within a backend engine.



DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

DeCAF, an open-source implementation of deep convolutional activation features, along with all associated network parameters, are released to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms.

ImageNet classification with deep convolutional neural networks

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

PANDA: Pose Aligned Networks for Deep Attribute Modeling

A new method which combines part-based models and deep learning by training pose-normalized CNNs for inferring human attributes from images of people under large variation of viewpoint, pose, appearance, articulation and occlusion is proposed.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

This integrated framework for using Convolutional Networks for classification, localization and detection is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 and obtained very competitive results for the detection and classifications tasks.

Recognizing Image Style

An approach to predicting style of images, and a thorough evaluation of different image features for these tasks, find that features learned in a multi-layer network generally perform best -- even when trained with object class (not style) labels.

Selective Search for Object Recognition

This paper introduces selective search which combines the strength of both an exhaustive search and segmentation, and shows that its selective search enables the use of the powerful Bag-of-Words model for recognition.

Torch7: A Matlab-like Environment for Machine Learning

Torch7 is a versatile numeric computing framework and machine learning library that extends Lua that can easily be interfaced to third-party software thanks to Lua’s light interface.

Open-vocabulary Object Retrieval

This paper introduces a novel object retrieval method that can combine categoryand instance-level semantics in a common representation and shows that the approach can accurately retrieve objects based on extremely varied open-vocabulary queries.

Pylearn2: a machine learning research library

A brief history of the library, an overview of its basic philosophy, a summary of the Library's architecture, and a description of how the Pylearn2 community functions socially are given.