• Publications
  • Influence
Learning long-term dependencies with gradient descent is difficult
We show why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. Expand
  • 4,969
  • 230
  • PDF
Best practices for convolutional neural networks applied to visual document analysis
This paper describes a set of concrete bestpractices that document analysis researchers can use toget good results with neural networks. Expand
  • 2,052
  • 134
  • PDF
Comparison of learning algorithms for handwritten digit recognition
This paper compares the relative merits of several classi cation algorithms developed at Bell Laboratories and elsewhere for the purpose of recognizing handwritten digits. Expand
  • 582
  • 42
Counterfactual reasoning and learning systems: the example of computational advertising
This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Expand
  • 385
  • 40
  • PDF
Efficient Pattern Recognition Using a New Transformation Distance
We propose a new distance measure which can be made locally invariant to any set of transformations of the input and (b) can be computed efficiently. Expand
  • 576
  • 39
  • PDF
Learning algorithms for classification: A comparison on handwritten digit recognition
This paper compares the performance of several classi er algorithms on a standard database of handwritten digits. Expand
  • 366
  • 36
Comparison of classifier methods: a case study in handwritten digit recognition
This paper compares the performance of several classifier algorithms on a standard database of handwritten digits with respect to training time, recognition time, and memory requirements. Expand
  • 541
  • 35
Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation
We introduce the concept of tangent vectors, which compactly represent the essence of these transformation invariances, and two classes of algorithms, “Tangent distance” and ‘Tangent propagation”, which make use of theseinvariances to improve performance. Expand
  • 301
  • 32
  • PDF
High quality document image compression with "DjVu"
We present a new image compression technique called \DjVu " that is speci cally geared towards the compression of high-resolution, high-quality images of scanned documents in color. Expand
  • 269
  • 30
  • PDF
Time is of the essence: a conjecture that hemispheric specialization arises from interhemispheric conduction delay.
Tomasch (1954) and Aboitiz et al. (1992) found the majority of the fibers of the human corpus callosum are under 1 micron in diameter. Electron microscopic studies of Swadlow et al. (1980) and theExpand
  • 511
  • 27