- Published 1972 in IEEE Trans. Systems, Man, and Cybernetics

In the above paper,1 an optimal method of self-optimization of certain system parameters using noisy binary-valued performance feedback is extended, without losing optimality, to situations with manyvalued performance feedback. The effect of time-varying feedback mechanisms is briefly considered. In the above paper' Shapiro and Narendra considered the problem of on-line self-optimization of a set of parameters {a} contained in a given system, the performance of which was only available in a form corrupted by noise. Thus measurements g(a,z), where z is a random quantity, are available, and the aim is to find a value for a which maximizes I(a) = E[g(a,z)], the expected value of g. They called a self-optimization algorithm optimal if it eventually chose, with probability 1, the optimum value for the parameter set, and presented an optimal self-optimization algorithm for the case where measurements of the system's performance were limited to the values 0 and 1-a penalty/nonpenalty situation. It is clearly of interest to extend this algorithm to systems with more general performance functions, and although Shapiro and Narendra claim to have accomplished this without sacrificing optimality (see the appendix'), this correspondence will show that their extension is not, in fact, optimal, but that such an optimal extension can be made by introducing an additional random element. Some comments will also be made on the use of the optimal algorithm in situations where the performance evaluation function I varies with time. The set of parameters {a} is assumed to have r possible values a1. Restricting attention for the moment to the penalty/nonpenalty situation, let Cl = Pr [action al causes a penalty response] = Pr [g(a,,z) = 0]. We assume that the Cl completely characterize the performance evaluation mechanism, so that successive performance measures are statistically independent. The "linear reinforcement scheme" of Shapiro and Narendra chooses the parameter value al with probability pi and Manuscript received August 9, 1971. The author is with the Department of Electrical Engineering Science, University of Essex, Colchester, Essex, England. 1 I. J. Shapiro and K. S. Narendra, IEEE Trans. Syst. Sci. Cybern., vol. SSC-5, pp. 352-360, Oct. 1969. updates the pj according to the consequent performance measure g, in accordance with the following: 1) if g = 0, do not change any pj; 2) if g = 1, then

@article{Witten1972CommentsO,
title={Comments on "Use of Stochastic Automata for Parameter Self-Optimization with Multimodal Performance Criteria"},
author={Ian H. Witten and R. Viswanathan and Kumpati S. Narendra and I. Joseph Shapiro},
journal={IEEE Trans. Systems, Man, and Cybernetics},
year={1972},
volume={2},
pages={289-290}
}