Learn More
a r t i c l e i n f o a b s t r a c t We study a partial-information online-learning problem where actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). In contrast to conventional approaches that require the absolute reward of the chosen strategy to be quantifiable and observable, our setting assumes only that(More)
We consider a stylized dynamic pricing model in which a monopolist prices a product to a sequence of T customers, who independently make purchasing decisions based on the price offered according to a general parametric choice model. The parameters of the model are unknown to the seller, whose objective is to determine a pricing policy that minimizes the(More)
  • 1