Bicheng Ying

Learn More
This paper examines the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value. The equivalence result is established for all(More)
This paper examines the learning mechanism of adaptive agents over weakly connected graphs and reveals an interesting behavior on how information flows through such topologies. The results clarify how asymmetries in the exchange of data can mask local information at certain agents and make them totally dependent on other agents. A leader-follower(More)
In this paper, we study diffusion social learning over weakly-connected graphs. We show that the asymmetric flow of information hinders the learning abilities of certain agents regardless of their local observations. Under some circumstances that we clarify in this work, a scenario of total influence (or "mind-control") arises where a set of influential(More)
In this paper, we examine the learning mechanism of adaptive agents over weakly-connected graphs and reveal an interesting behavior on how information flows through such topologies. The results clarify how asymmetries in the exchange of data can mask local information at certain agents and make them totally dependent on other agents. A leader-follower(More)
This work examines the performance of stochastic sub-gradient learning strategies, for both cases of stand-alone and networked agents, under weaker conditions than usually considered in the literature. It is shown that these conditions are automatically satisfied by several important cases of interest, including support-vector machines and sparsity-inducing(More)
In this paper, we study diffusion social learning over weakly connected graphs. We show that the asymmetric flow of information hinders the learning abilities of certain agents regardless of their local observations. Under some circumstances that we clarify in this paper, a scenario of <italic>total influence</italic> (or &#x201C;mind-control&#x201D;)(More)
<lb>This work develops a fully decentralized variance-reduced learning algorithm for<lb>on-device intelligence where nodes store and process the data locally and are only<lb>allowed to communicate with their immediate neighbors. In the proposed algo-<lb>rithm, there is no need for a central or master unit while the objective is to enable<lb>the dispersed(More)
In empirical risk optimization, it has been observed that gradient descent implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data randomly and independently of each other. Recent works have pursued justifications for this behavior by examining the convergence rate of the(More)
This work examines the mean-square error performance of diffusion stochastic algorithms under a generalized coordinate-descent scheme. In this setting, the adaptation step by each agent is limited to a random subset of the coordinates of its stochastic gradient vector. The selection of coordinates varies randomly from iteration to iteration and from agent(More)