Learn More
We present an accelerated gradient method for non-convex optimization problems with Lipschitz continuous first and second derivatives. The method requires time O( −7/4 log(1/ )) to find an -stationary point, meaning a point x such that ‖∇f(x)‖ ≤ . The method improves upon the O( −2) complexity of gradient descent and provides the additional second-order(More)
We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima. Specifically, we examine MNNs with piecewise linear activation functions, quadratic loss and a single output, under mild over-parametrization. We prove that for a MNN with one hidden layer, the training error is(More)
Partial matching of geometric structures is important in computer vision, pattern recognition and shape analysis applications. The problem consists of matching similar parts of shapes that may be dissimilar as a whole. Recently, it was proposed to consider partial similarity as a multi-criterion optimization problem trying to simultaneously maximize the(More)
We consider the minimization of non-convex quadratic forms regularized by a cubic term, which exhibit multiple saddle points and poor local minima. Nonetheless, we prove that, under mild assumptions, gradient descent approximates the global minimum to within ε accuracy in O(ε−1 log(1/ε)) steps for large ε and O(log(1/ε)) steps for small ε (compared to a(More)
We develop and analyze a variant of Nesterov’s accelerated gradient descent (AGD) for minimization of smooth non-convex functions. We prove that one of two cases occurs: either our AGD variant converges quickly, as if the function was convex, or we produce a certificate that the function is “guilty” of being non-convex. This non-convexity certificate allows(More)
We consider mean squared estimation with lookahead of a continuous-time signal corrupted by additive white Gaussian noise. We investigate the connections between lookahead in estimation, and information under this model. We show that the mutual information rate function, i.e., the mutual information rate as function of the signal-to-noise ratio (SNR) does(More)
We compare the maximum achievable rates in single-carrier (SC) and orthogonal frequency-division multiplexing (OFDM) modulation schemes, under the practical assumptions of independent identically distributed finite alphabet inputs and linear intersymbol interference with additive Gaussian noise. We show that the Shamai-Laroia approximation serves as a(More)
We consider the discrete-time intersymbol interference (ISI) channel model, with additive Gaussian noise and fixed independent identically distributed inputs. In this setting, we investigate the expression put forth by Shamai and Laroia as a conjectured lower bound for the input-output mutual information after application of a minimum mean-square error(More)
We investigate the existance of simple policies in finite discounted cost Markov Decision Processes, when the discount factor is not constant. We introduce a class called “exponentially representable” discount functions. Within this class we prove existence of optimal policies which are eventually stationary—from some time N onward, and provide an algorithm(More)