• Corpus ID: 189762337

Competing Bandits in Matching Markets

  title={Competing Bandits in Matching Markets},
  author={Lydia T. Liu and Horia Mania and Michael I. Jordan},
Stable matching, a classical model for two-sided markets, has long been studied with little consideration for how each side's preferences are learned. With the advent of massive online markets powered by data-driven matching platforms, it has become necessary to better understand the interplay between learning and market objectives. We propose a statistical learning model in which one side of the market does not have a priori knowledge about its preferences for the other side and is required to… 

Figures and Tables from this paper

Bandit Learning in Decentralized Matching Markets

This model extends the standard stochastic multi-armed bandit framework to a decentralized multiple player setting with competition and introduces a new algorithm for this setting that attains stable regret when preferences of the arms over players are shared.

Decentralized Competing Bandits in Non-Stationary Matching Markets

This paper proposes and analyzes a decentralized and asynchronous learning algorithm, namely Decentralized Non-stationary Competing Bandits (DNCB), where the agents play (restrictive) successive elimination type learning algorithms to learn their preference over the arms.

Competing Bandits in Time Varying Matching Markets

The Restart Competing Bandits (RCB) algorithm is proposed, which combines a simple restart strategy to handle the non-stationarity with the competing bandits algorithm (Liu et al., 2020) designed for the stationary case.

Learning in Multi-Stage Decentralized Matching Markets

This article proposes an efficient algorithm, built upon concepts of “lower uncertainty bound” and “calibrated decentralized matching,” for maximizing the participants’ expected payoff and shows that there exists a welfare-versus-fairness trade-off that is characterized by the uncertainty level of acceptance.

Thompson Sampling for Bandit Learning in Matching Markets

The first regret analysis for TS in the new setting of iterative matching markets is provided, demonstrating the practical advantages of the TS-type algorithm over the ETC and UCB-type baselines.

Bandit Learning in Many-to-One Matching Markets

This work develops algorithms in both centralized and decentralized settings and proves regret bounds of order O(log T) and Olog2 T) respectively and shows the convergence and effectiveness of the algorithms.

Learning Equilibria in Matching Markets from Bandit Feedback

This work designs an incentive-aware learning objective that captures the distance of a market outcome from equilibrium, and analyzes the complexity of learning as a function of preference structure, castinglearning as a stochastic multi-armed bandit problem.

Learning Strategies in Decentralized Matching Markets under Uncertain Preferences

An optimal strategy is derived that maximizes the agent's expected payoff and calibrate the uncertain state by taking the opportunity costs into account and proves a fairness property that asserts that there exists no justified envy according to the proposed strategy.

Double Matching Under Complementary Preferences

A new algorithm for addressing the problem of matching markets with complementary preferences, where agents’ preferences are unknown a priori and must be learned from data, called the Multi-agent Multi-type Thompson Sampling (MMTS) algorithm.

Dominate or Delete: Decentralized Competing Bandits in Serial Dictatorship

This work designs the first decentralized algorithm – UCB with Decentralized Dominant-arm Deletion (UCB-D3), for the agents, that does not require any knowledge of reward gaps or time horizon, and proves both, a new regret lower bound for the decentralized serial dictatorship model, and that UCB- D3 is order optimal.



Two-Sided Bandits and the Dating Market

We study the decision problems facing agents in repeated matching environments with learning, or two-sided bandit problems, and examine the dating market, in which men and women repeatedly go out on

Competing Bandits: Learning Under Competition

A study of the interplay between exploration and competition--how such systems balance the exploration for learning and the competition for users is initiated, closely related to the "competition vs. innovation" relationship.

Competing Bandits: The Perils of Exploration under Competition

It is found that stark competition induces firms to commit to a "greedy" bandit algorithm that leads to low consumer welfare, but it is also found that weakening competition by providing firms with some "free" consumers incentivizes better exploration strategies and increases consumer welfare.

Communication Requirements and Informative Signaling in Matching Markets

This model modifies workers-proposing DA, by having firms signal workers they especially like, while also broadcasting qualification requirements to discourage other workers who have no realistic chances from applying, and has good incentive properties and gives insights on how to mediate large matching markets to reduce congestion.

Matching while Learning

We consider the problem faced by a service platform that needs to match supply with demand but also to learn attributes of new arrivals in order to match them better in the future. We introduce a

Distributed Multi-Player Bandits - a Game of Thrones Approach

This is the first algorithm to achieve a poly-logarithmic regret in this fully distributed scenario and it is proved that it achieves an expected sum of regrets of near-O\left(\log^{2}T\right).

Multi-armed bandits in multi-agent networks

This paper addresses the multi-armed bandit problem in a multi-player framework with a distributed variant of the well-known UCB1 algorithm that is optimal in the sense that in a complete network it scales down the regret of its single-player counterpart by the network size.

Two-Sided Matching: A Study in Game-Theoretic Modeling and Analysis

The marriage model and the labor market for medical interns, a simple model of one seller and many buyers, and Discrete models with money, and more complex preferences are examined.

Distributed Learning in Multi-Armed Bandit With Multiple Players

It is shown that the minimum system regret of the decentralized MAB grows with time at the same logarithmic order as in the centralized counterpart where players act collectively as a single entity by exchanging observations and making decisions jointly.

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs.