We study the sample complexity of learning neural networks, by providing new bounds on their Rademacher complexity assuming norm constraints on the parameter matrix of each layer. Compared to… (More)

An acyclic set in a digraph is a set of vertices that induces an acyclic subgraph. In 2011, Harutyunyan conjectured that every planar digraph on n vertices without directed 2-cycles possesses an… (More)

In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent. The main new result in this paper is theoretical and… (More)

In Theory III we characterize with a mix of theory and experiments the generalization properties of Stochastic Gradient Descent in overparametrized deep convolutional networks. We show that… (More)

We ruminate with a mix of theory and experiments on the optimization and generalization properties of deep convolutional networks trained with Stochastic Gradient Descent in classification tasks. A… (More)

Moulin [1980] characterizes the single-facility, deterministic strategy-proof mechanisms for social choice with single-peaked preferences as the set of generalized median rules. In contrast, we have… (More)

An acyclic set in a digraph is a set of vertices that induces an acyclic subgraph. In 2011, Harutyunyan conjectured that every planar digraph on n vertices without directed 2-cycles possesses an… (More)

In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent. The main new result in this paper is theoretical and… (More)

A linear equation is r-regular, if, for every r-coloring of the positive integers, there exist positive integers of the same color which satisfy the equation. In 2005, Fox and Radoićič conjectured… (More)

In Theory III we characterize with a mix of theory and experiments the consistency and generalization properties of deep convolutional networks trained with Stochastic Gradient Descent in… (More)