• Corpus ID: 229153577

Bandit Learning in Decentralized Matching Markets

@article{Liu2021BanditLI,
  title={Bandit Learning in Decentralized Matching Markets},
  author={Lydia T. Liu and Feng Ruan and Horia Mania and M.I. Jordan},
  journal={J. Mach. Learn. Res.},
  year={2021},
  volume={22},
  pages={211:1-211:34}
}
We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience. Also, we assume the players have no direct means of communication. This model extends the standard stochastic multi-armed bandit framework to a decentralized multiple player setting with competition. We introduce a new algorithm for this setting that, over a time horizon $T… 

Figures from this paper

Regret, stability, and fairness in matching markets with bandit learners

By modeling two additional components of competition—namely, costs and transfers—it is proved that it is possible to simultaneously guarantee four desiderata: stability, low optimal regret, fairness in the distribution of regret, and high social welfare.

Beyond log2(T) Regret for Decentralized Bandits in Matching Markets

A phase based algorithm, where in each phase, besides deleting the globally communicated dominated arms, the agents locally delete arms with which they collide often, is proposed, pivotal in breaking deadlocks arising from rank heterogeneity of agents across arms.

Decentralized Learning in Online Queuing Systems

Cooperative queues are considered and the first learning decentralized algorithm guaranteeing stability of the system as long as the ratio of rates is larger than 1 is proposed, thus reaching performances comparable to centralized strategies.

Matching in Multi-arm Bandit with Collision

With this subtle communication protocol, the matching of multi-agent multi-armed bandit problem is considered, and the algorithm achieves a state-of-the-art O (log T ) regret in the decentralized matching market, and outperforms existing baselines in experimental results.

Bandit Learning in Many-to-One Matching Markets

This work develops algorithms in both centralized and decentralized settings and proves regret bounds of order O(log T) and Olog2 T) respectively and shows the convergence and effectiveness of the algorithms.

Decentralized Competing Bandits in Non-Stationary Matching Markets

This paper proposes and analyzes a decentralized and asynchronous learning algorithm, namely Decentralized Non-stationary Competing Bandits (DNCB), where the agents play (restrictive) successive elimination type learning algorithms to learn their preference over the arms.

Double Auctions with Two-sided Bandit Feedback

Double Auction enables decentralized transfer of goods between multiple buyers and sellers, thus underpinning functioning of many online marketplaces. Buyers and sellers compete in these markets

A survey on multi-player bandits

This survey contextualizes and organizes the rich multiplayer bandits literature and believes that a further study of these different directions might lead to theoretical algorithms adapted to real-world situations.

Competing Bandits in Time Varying Matching Markets

The Restart Competing Bandits (RCB) algorithm is proposed, which combines a simple restart strategy to handle the non-stationarity with the competing bandits algorithm (Liu et al., 2020) designed for the stationary case.

Statistical Inference for Fisher Market Equilibrium

Statistical inference under market equilibrium effects has attracted increasing attention recently. In this paper we focus on the specific case of linear Fisher markets. They have been widely use in

References

SHOWING 1-10 OF 51 REFERENCES

Competing Bandits in Matching Markets

This work proposes a statistical learning model in which one side of the market does not have a priori knowledge about its preferences for the other side and is required to learn these from stochastic rewards.

Learning Strategies in Decentralized Matching Markets under Uncertain Preferences

An optimal strategy is derived that maximizes the agent's expected payoff and calibrate the uncertain state by taking the opportunity costs into account and proves a fairness property that asserts that there exists no justified envy according to the proposed strategy.

Dominate or Delete: Decentralized Competing Bandits with Uniform Valuation

The first decentralized algorithm is designed, for matching bandits under uniform valuation that does not require any knowledge of reward gaps or time horizon, and thus partially resolves an open question in matching bandit models.

Two-Sided Bandits and the Dating Market

We study the decision problems facing agents in repeated matching environments with learning, or two-sided bandit problems, and examine the dating market, in which men and women repeatedly go out on

Thickness and Information in Dynamic Matching Markets

A model of dynamic matching in networked markets, where agents arrive and depart stochastically and the composition of the trade network depends endogenously on the matching algorithm, and conditions under which local algorithms that choose the right time to match agents are close to optimal.

Communication Requirements and Informative Signaling in Matching Markets

This model modifies workers-proposing DA, by having firms signal workers they especially like, while also broadcasting qualification requirements to discourage other workers who have no realistic chances from applying, and has good incentive properties and gives insights on how to mediate large matching markets to reduce congestion.

Distributed Multi-Player Bandits - a Game of Thrones Approach

This is the first algorithm to achieve a poly-logarithmic regret in this fully distributed scenario and it is proved that it achieves an expected sum of regrets of near-O\left(\log^{2}T\right).

My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits

This work designs a distributed algorithm that learns the matching between players and arms that achieves max-min fairness while minimizing the regret, and proves that it is regret optimal up to a $\log\log T$ factor.

Distributed Learning in Multi-Armed Bandit With Multiple Players

It is shown that the minimum system regret of the decentralized MAB grows with time at the same logarithmic order as in the centralized counterpart where players act collectively as a single entity by exchanging observations and making decisions jointly.

Matching while Learning

We consider the problem faced by a service platform that needs to match supply with demand but also to learn attributes of new arrivals in order to match them better in the future. We introduce a
...