• Publications
  • Influence
Learning long-term dependencies with gradient descent is difficult
TLDR
This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.
Best practices for convolutional neural networks applied to visual document analysis
TLDR
A set of concrete bestpractices that document analysis researchers can use to get good results with neural networks, including a simple "do-it-yourself" implementation of convolution with a flexible architecture suitable for many visual document problems.
Counterfactual reasoning and learning systems: the example of computational advertising
This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such
Comparison of learning algorithms for handwritten digit recognition
TLDR
This comparison of several learning algorithms for handwritten digits considers not only raw accuracy, but also rejection, training time, recognition time, and memory requirements.
Learning algorithms for classification: A comparison on handwritten digit recognition
This paper compares the performance of several classi er algorithms on a standard database of handwritten digits. We consider not only raw accuracy, but also training time, recognition time, and
Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation
TLDR
This chapter introduces the concept of tangent vectors, which compactly represent the essence of these transformation invariances, and two classes of algorithms, “Tangent distance” and ‘Tangent propagation”, which make use of theseinvariances to improve performance.
Comparison of classifier methods: a case study in handwritten digit recognition
This paper compares the performance of several classifier algorithms on a standard database of handwritten digits. We consider not only raw accuracy, but also training time, recognition time, and
Efficient Pattern Recognition Using a New Transformation Distance
TLDR
A new distance measure which can be made locally invariant to any set of transformations of the input and can be computed efficiently is proposed.
Time is of the essence: a conjecture that hemispheric specialization arises from interhemispheric conduction delay.
TLDR
It is suggested that the large brains of mammals such as elephants and cetaceans will also manifest a high degree of hemispheric specialization if the neural apparatus necessary to perform each high-resolution, time-critical task is gathered in one hemisphere.
High Performance Convolutional Neural Networks for Document Processing
TLDR
Three novel approaches to speeding up CNNs are presented: a) unrolling convolution, b) using BLAS (basic linear algebra subroutines), and c) using GPUs (graphic processing units).
...
...