Best practices for convolutional neural networks applied to visual document analysis

@article{Simard2003BestPF,
  title={Best practices for convolutional neural networks applied to visual document analysis},
  author={P. Simard and David Steinkraus and John C. Platt},
  journal={Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.},
  year={2003},
  pages={958-963}
}
Neural networks are a powerful technology forclassification of visual inputs arising from documents. [...] Key Method Wepropose that a simple "do-it-yourself" implementation ofconvolution with a flexible architecture is suitable formany visual document problems. This simpleconvolutional neural network does not require complexmethods, such as momentum, weight decay, structure-dependentlearning rates, averaging layers, tangent prop,or even finely-tuning the architecture. The end result is avery simple yet general…Expand
Analysis of Convolutional Neural Networks for Document Image Classification
TLDR
A large empirical study is conducted to find what aspects of CNNs most affect performance on document images and exceeds the state-of-the-art on the RVL-CDIP dataset by using shear transform data augmentation and an architecture designed for a larger input image. Expand
Best Practices for Convolutional Neural Networks Applied to Object Recognition in Images
TLDR
This research project explores different architectures and training configurations with the use of ReLUs, Nesterov's accelerated gradient, dropout and maxout networks, and 4 models of convolutional neural networks that explore characteristics such as depth, number of feature maps, size and overlap of kernels, pooling regions, and different subsampling techniques. Expand
High Performance Convolutional Neural Networks for Document Processing
TLDR
Three novel approaches to speeding up CNNs are presented: a) unrolling convolution, b) using BLAS (basic linear algebra subroutines), and c) using GPUs (graphic processing units). Expand
HWNet v2: an efficient word image representation for handwritten documents
TLDR
The proposed framework for learning an efficient holistic representation for handwritten word images uses a deep convolutional neural network with traditional classification loss and leads to a state-of-the-art word spotting performance on standard handwritten datasets and historical manuscripts in different languages with minimal representation size. Expand
Improved neural network OCR based on preprocessed blob classes
TLDR
The architecture of an OCR technology based on a multilayer neural network is presented and it is argued that most of the characters that must be recognized have a similar layout, thus improvement of the processing performance can be obtained by creating classes of similar characters (blobs) based on geometric similarities. Expand
Reading Text in the Wild with Convolutional Neural Networks
TLDR
An end-to-end system for text spotting—localising and recognising text in natural scene images—and text based image retrieval and a real-world application to allow thousands of hours of news footage to be instantly searchable via a text query is demonstrated. Expand
Improved architecture of the feedforward neural network for image recognition
TLDR
A feedforward neural network, namely the scaled conjugate gradient backpropagation feed forward neural network with random connections (SCGBP-FNN-RC) to learn big data through recognizing images from the widely known MNIST dataset which is applied with affine and elastic distortions. Expand
Opleiding Informatica Visual classification of e-discovery images with neural networks
We investigate whether transfer learning can be applied to images related to eDiscovery, using a deep convolutional network named inceptionV3, trained on a large dataset of general objects forExpand
Artificial neural networks for document analysis and recognition
  • S. Marinai, M. Gori, G. Soda
  • Computer Science, Medicine
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2005
TLDR
This paper surveys the most significant problems in the area of offline document image processing, where connectionist-based approaches have been applied and depicts the most promising research guidelines in the field. Expand
Image Captioning using Deep Learning
TLDR
A generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation is being used and will be trained to maximize the likelihood of the target description sentence given the training image. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 12 REFERENCES
Gradient-based learning applied to document recognition
TLDR
This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques. Expand
Neural Networks for Pattern Recognition
TLDR
The chapter discusses two important directions of research to improve learning algorithms: the dynamic node generation, which is used by the cascade correlation algorithm; and designing learning algorithms where the choice of parameters is not an issue. Expand
An offline cursive handwritten word recognition system
TLDR
This paper describes an offline cursive handwritten word recognition system that combines hidden Markov models (HMM) and neural networks (NN) and presents the preprocessing and the recognition process as well as the training procedure for the NN-HMM hybrid system. Expand
An improved recognition module for the identification of handwritten digits
TLDR
A new and improved recognition module designed to read the courtesy amount on Brazilian checks is proposed, which focuses on a neural network classifier with a structural system to verify recognition which have both been designed and tailored in a unique way. Expand
Effective Training of a Neural Network Character Classifier for Word Recognition
TLDR
Some innovations in the training and use of ANNs as character classifiers for word recognition, including normalized output error, frequency balancing, error emphasis, negative training, and stroke warping are presented. Expand
Mitigating the Paucity-of-Data Problem: Exploring the Effect of Training Corpus Size on Classifier Performance for Natural Language Processing
In this paper, we discuss experiments applying machine learning techniques to the task of confusion set disambiguation, using three orders of magnitude more training data than has previously beenExpand
Training Invariant Support Vector Machines
TLDR
This work reports the recent achievement of the lowest reported test error on the well-known MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than previous SVM methods. Expand
Proceedings Seventh International Conference on Document Analysis and Recognition
  • Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.
  • 2003
The following topics are dealt with: document analysis and recognition; multiple classifiers; feature analysis; document understanding; hidden Markov models; text segmentation; character recognition;Expand
The mnist database of handwritten digits
Disclosed is an improved articulated bar flail having shearing edges for efficiently shredding materials. An improved shredder cylinder is disclosed with a plurality of these flails circumferentiallyExpand
Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks
A shoulder strap retainer having a base to be positioned on the exterior shoulder portion of a garment with securing means attached to the undersurface of the base for removably securing the base toExpand
...
1
2
...