Sangwoo Mo

Learn More
We study contextual multi-armed bandit problems under linear realizability on rewards and uncertainty (or noise) on features. For the case of identical noise on features across actions, we propose an algorithm, coined {\em NLinRel}, having $O\left(T^{\frac{7}{8}} \left(\log{(dT)}+K\sqrt{d}\right)\right)$ regret bound for $T$ rounds, $K$ actions, and(More)
  • 1