Let H be a finite class of hypotheses (classification functions). In the HAL algorithm in each iteration we divide H into two groups and go with the majority decision. If ct ∈ H then the HAL algorithm will perform O(log |H|) errors. What happens when ct / ∈ H? We would like to find the best h ∈ H. Intuitively, we would like to bound the number of errors by the error of the best h and some additional constant. An equivalent model: 1. For each step t, the learner chooses a combination of weights w 1, ..., w t |H| for all h ∈ H. 2. For each step t, the learner receives a loss l ∈ [0, 1]|H| where l i ∈ [0, 1] is the loss of action h ∈ H.