• Corpus ID: 62932094

Competing Bandits: The Perils of Exploration under Competition

@article{Aridor2019CompetingBT,
  title={Competing Bandits: The Perils of Exploration under Competition},
  author={Guy Aridor and Kevin Zhengcheng Liu and Aleksandrs Slivkins and Zhiwei Steven Wu},
  journal={ArXiv},
  year={2019},
  volume={abs/1902.05590}
}
Most online platforms strive to learn from interactions with consumers, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information. We initiate a study of the interplay between exploration and competition: how such platforms balance the exploration for learning and the competition for consumers. Here consumers play three distinct roles: they are customers that generate revenue, they are sources of data for learning, and they are self… 
How Does Competition Affect Exploration vs. Exploitation? A Tale of Two Recommendation Algorithms
Through repeated interactions, firms today refine their understanding of individual users' preferences adaptively for personalization. In this paper, we use a continuous-time multi-agent bandit model
Regret, stability, and fairness in matching markets with bandit learners
TLDR
By modeling two additional components of competition—namely, costs and transfers—it is proved that it is possible to simultaneously guarantee four desiderata: stability, low optimal regret, fairness in the distribution of regret, and high social welfare.
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity
TLDR
It is proved that Thompson Sampling, a standard bandit algorithm, is incentive-compatible if initialized with sufficiently many data points, and the performance loss due to incentives is limited to the initial rounds when these data points are collected.
Learning Equilibria in Matching Markets from Bandit Feedback
TLDR
This work designs an incentive-aware learning objective that captures the distance of a market outcome from equilibrium, and analyzes the complexity of learning as a function of preference structure, castinglearning as a stochastic multi-armed bandit problem.
Competing Bandits in Matching Markets
TLDR
This work proposes a statistical learning model in which one side of the market does not have a priori knowledge about its preferences for the other side and is required to learn these from stochastic rewards.
Noble Deceit: Optimizing Social Welfare for Myopic Multi-Armed Bandits
In the information economy, consumer-generated information greatly informs the decisions of future consumers. However, myopic consumers seek to maximize their own reward with no regard for the
Decentralized, Communication- and Coordination-free Learning in Structured Matching Markets
TLDR
This work proposes a class of decentralized, communication- and coordination-free algorithms that agents can use to reach to their stable match in structured matching markets that make decisions based solely on an agent’s own history of play and requires no foreknowledge of the agents’ preferences.
The Effect of Privacy Regulation on the Data Industry: Empirical Evidence from GDPR
TLDR
A selected set of consumers who substitute from pre-existing privacy means towards those offered as part of GDPR affect the ability to predict consumer behavior and preferences as well as improves the ability of advertisers to measure the effectiveness of advertising.
The effects of competition and regulation on error inequality in data-driven markets
TLDR
This work develops a high-level model that predicts unfairness in a monopoly setting and considers two avenues for regulating a machine-learning driven monopolist - relative error inequality and absolute error-bounds - and quantifies the price of fairness.
Beyond log2(T) Regret for Decentralized Bandits in Matching Markets
TLDR
A phase based algorithm, where in each phase, besides deleting the globally communicated dominated arms, the agents locally delete arms with which they collide often, is proposed, pivotal in breaking deadlocks arising from rank heterogeneity of agents across arms.
...
...

References

SHOWING 1-10 OF 139 REFERENCES
Competing Bandits: Learning Under Competition
TLDR
A study of the interplay between exploration and competition--how such systems balance the exploration for learning and the competition for users is initiated, closely related to the "competition vs. innovation" relationship.
The Perils of Exploration under Competition: A Computational Modeling Approach
TLDR
It is found that duopoly and monopoly tend to favor a primitive "greedy algorithm" that does not explore and leads to low consumer welfare, whereas a temporary monopoly (a duopoly with an early entrant) may incentivize better bandit algorithms and lead to higher consumer welfare.
Crowdsourcing Exploration
TLDR
A decentralized multi-armed bandit framework where a forward-looking principal commits upfront to a policy that dynamically discloses information regarding the history of outcomes to a series of short-lived rational agents, demonstrating that consumer surplus is non-monotone in the accuracy of the designer's information-provision policy.
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity
TLDR
It is proved that Thompson Sampling, a standard bandit algorithm, is incentive-compatible if initialized with sufficiently many data points, and the performance loss due to incentives is limited to the initial rounds when these data points are collected.
Bayesian Exploration: Incentivizing Exploration in Bayesian Games
TLDR
The goal is to design a recommendation policy for the principal which respects agents' incentives and minimizes a suitable notion of regret, and shows how the principal can identify (and explore) all explorable actions, and use the revealed information to perform optimally.
Competing with Big Data
This paper studies competition in data-driven markets, that is, markets where the cost of quality production is decreasing in the amount of machine-generated data about user preferences or
Learning-by-Doing and Market Performance
This article studies the implications of learning-by-doing for market conduct and performance. We use a general continuous-time model to show that output increases over time in the absence of
Data-enabled learning, network effects and competitive advantage∗
We provide a model of competition in which firms can improve their products through learning from the data they obtain on customers they serve. The model is used to explore the implications for
Price Dispersion and Learning in a Dynamic Differentiated-Goods Duopoly
We study the evolution of prices set by duopolists who are uncertain about the perceived degree of product differentiation. Customers sometimes view the products as close substitutes, sometimes as
Sample Complexity of Incentivized Exploration
We consider incentivized exploration: a version of multi-armed bandits where the choice of actions is controlled by self-interested agents, and the algorithm can only issue recommendations. The
...
...