Just storing the Hessian H (the matrix of second derivatives 2E/wiwj of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. To calculate Hv, we first define a differential operator Rv{f(w)} = (/r)f(w rv)|r=0, note that Rv{w} = Hv and Rv{w} = v, and then apply Rv{} to the equations usedâ€¦Â CONTINUE READING