Ilya O. Ryzhov

Learn More
We derive a one-period look-ahead policy for finiteand infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multi-armed bandit methods. Experiments show that our KG policy performs competitively against(More)
We derive a one-period look-ahead policy for online subset selection problems, where learning about one subset also gives us information about other subsets. The subset selection problem is treated as a multi-armed bandit problem with correlated prior beliefs. We show that our decision rule is easily computable, and present experimental evidence that the(More)
We consider a ranking and selection problem with independent normal observations, and analyze the asymptotic sampling rates of expected improvement (EI) methods in this setting. Such methods often perform well in practice, but a tractable analysis of their convergence rates is difficult due to the nonlinearity and nonconvexity of the functions used in the(More)
We derive a knowledge gradient policy for an optimal learning problem on a graph, in which we use sequential measurements to refine Bayesian estimates of individual edge values in order to learn about the best path. This problem differs from traditional ranking and selection, in that the implementation decision (the path we choose) is distinct from the(More)
We provide a tutorial overview of simulation optimization methods, including statistical ranking & selection (R&S) methods such as indifference-zone procedures, optimal computing budget allocation (OCBA), and Bayesian value of information (VIP) approaches; random search methods; sample average approximation (SAA); response surface methodology (RSM);(More)
In approximate dynamic programming, we can represent our uncertainty about the value function using a Bayesian model with correlated beliefs. Thus, a decision made at a single state can provide us with information about many states, making each individual observation much more powerful. We propose a new exploration strategy based on the knowledge gradient(More)
The goal of this panel was to discuss the state of the art in simulation optimization research and practice. The participants included representation from both academia and industry, where the latter was represented by participation from a leading software provider of optimization tools for simulation. This paper begins with a short introduction to(More)
We examine a newsvendor problem with two agents: a requesting agent that observes private demand information, and an oversight agent that must determine how to allocate resources upon receiving a bid from the requesting agent. Because the two agents have different cost structures, the requesting agent tends to bid higher than the amount that is actually(More)