We consider feed-forward neural networks with one non-linear hidden layer and linear output units. The transfer function in the hidden layer are either bell-shaped or sigmoid. In the bell-shaped case, we show how Bernstein polynomials on one hand and the theory of the heat equation on the other are relevant for understanding the properties of the corresponding networks. In particular, these techniques yield simple proofs of universal approximation properties, i.e. of the fact that any reasonable function can be approximated to any degree of precision by a linear combination of bellshaped functions. In addition, in this framework the problem of learning is equivalent to the problem of reversing the time course of a diffusion process. The results obtained in the bell-shaped case can then be applied to the case of sigmoid transfer functions in the hidden layer, yielding similar universality results. A conjecture related to the problem of generalization is briefly examined.