#### Filter Results:

- Full text PDF available (7)

#### Publication Year

2013

2016

- This year (0)
- Last 5 years (7)
- Last 10 years (7)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Data Set Used

#### Key Phrases

Learn More

- Li Wan, Matthew D. Zeiler, Sixin Zhang, Yann LeCun, Rob Fergus
- ICML
- 2013

We introduce DropConnect, a generalization of Dropout (Hinton et al., 2012), for regularizing large fully-connected layers within neural networks. When training with Dropout, a randomly selected subset of activations are set to zero within each layer. DropConnect instead sets a randomly selected subset of weights within the network to zero. Each unit thus… (More)

- Tom Schaul, Sixin Zhang, Yann LeCun
- ICML
- 2013

The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase… (More)

- Sixin Zhang, Anna Choromanska, Yann LeCun
- NIPS
- 2015

We study the problem of stochastic optimization for deep learning in the parallel computing environment under communication constraints. A new algorithm is proposed in this setting where the communication and coordination of work among concurrent processes (local workers), is based on an elastic force which links the parameter vectors they compute with a… (More)

- Sixin Zhang
- ArXiv
- 2016

We study the problem of how to distribute the training of large-scale deep learning models in the parallel computing environment. We propose a new distributed stochastic optimization method called Elastic Averaging SGD (EASGD). We analyze the convergence rate of the EASGD method in the synchronous scenario and compare its stability condition with the… (More)

- Sixin Zhang
- 2016

We study the problem of how to distribute the training of large-scale deep learning models in the parallel computing environment. We propose a new distributed stochastic optimization method called Elastic Averaging SGD (EASGD). We analyze the convergence rate of the EASGD method in the synchronous scenario and compare its stability condition with the… (More)

- Tom Schaul, Sixin Zhang, Yann LeCun
- 2013

If we do gradient descent with η * (t), then almost surely, the algorithm converges (for the quadratic model). To prove that, we follow classical techniques based on Lyapunov stability theory (Bucy, 1965). Notice that the expected loss follows E J θ (t+1) | θ (t) = 1 2 h · E (1 − η * h)(θ (t) − θ *) + η * hσξ 2 + σ 2 = 1 2 h (1 − η * h) 2 (θ (t) − θ *) 2 +… (More)

- Somshubra Majumdar, Ishaan Jain, +5 authors Jie Huang
- 2016

Recent developments in the field of deep learning have shown that convolutional networks with several layers can approach human level accuracy in tasks such as handwritten digit classification and object recognition. It is observed that the state-of-the-art performance is obtained from model ensembles, where several models are trained on the same data and… (More)

- ‹
- 1
- ›