How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate… (More)

Figure 3: A simple illustration of Bayesian optimisation in one dimension. The goal is to maximise some true unknown function f (not shown). Information about this function is gained by making observations (circles, top panels), which are evaluations of the function at specific x values. These observations are used to infer a posterior distribution over the function values (shown as mean, blue line, and standard deviations, blue shaded area) representing the distribution of possible functions; note that uncertainty grows away from the observations. Based on this distribution over functions, an acquisition function is computed (green shaded area, bottom panels), which represents the gain from evaluating the unknown function f at different x values; note that the acquisition function is high where the posterior over f has both high mean and large uncertainty. Different acquisition functions can be used such as “expected improvement” or “information-gain”. The peak of the acquisition function (red line) is the best next point to evaluate, and is therefore chosen for evaluation (red dot, new observation). The left and right panels show an example of what could happen after three and four functions evaluations, respectively.