Matching while Learning

@article{Johari2017MatchingWL,
  title={Matching while Learning},
  author={Ramesh Johari and Vijay Kamble and Yashodhan Kanoria},
  journal={Proceedings of the 2017 ACM Conference on Economics and Computation},
  year={2017}
}
We consider the problem faced by a service platform that needs to match supply with demand but also to learn attributes of new arrivals in order to match them better in the future. We introduce a benchmark model with heterogeneous workers and jobs that arrive over time. Job types are known to the platform, but worker types are unknown and must be learned by observing match outcomes. Workers depart after performing a certain number of jobs. The payoff from a match depends on the pair of types… 

Figures and Tables from this paper

Know Your Customer: Multi-armed Bandits with Capacity Constraints
TLDR
This work constructs a policy that has provably optimal regret (to leading order as $N$ grows large) and employs the shadow prices of the capacity constraints in the assignment problem with known types as "externality prices" on the servers' capacity.
Bandit Labor Training
TLDR
This work analyzes a novel objective within the stochastic multi-armed bandit framework, and designs an explore-then-commit policy featuring exploration based on appropriately tuned confidence bounds on the mean reward and an adaptive stopping criterion, which adapts to the problem difficulty and achieves these bounds.
Learning Equilibria in Matching Markets from Bandit Feedback
TLDR
This work designs an incentive-aware learning objective that captures the distance of a market outcome from equilibrium, and analyzes the complexity of learning as a function of preference structure, castinglearning as a stochastic multi-armed bandit problem.
Decentralized, Communication- and Coordination-free Learning in Structured Matching Markets
TLDR
This work proposes a class of decentralized, communication- and coordination-free algorithms that agents can use to reach to their stable match in structured matching markets that make decisions based solely on an agent’s own history of play and requires no foreknowledge of the agents’ preferences.
Decentralized Competing Bandits in Non-Stationary Matching Markets
TLDR
This paper proposes and analyzes a decentralized and asynchronous learning algorithm, namely Decentralized Non-stationary Competing Bandits (DNCB), where the agents play (restrictive) successive elimination type learning algorithms to learn their preference over the arms.
Learning Proportionally Fair Allocations with Low Regret
TLDR
The properties of the so-called Restricted-PF (RPF) allocation are provided, obtained by assuming that each task can only use a single server, and in particular show that it is very close to the PF allocation.
Dominate or Delete: Decentralized Competing Bandits with Uniform Valuation
TLDR
The first decentralized algorithm is designed, for matching bandits under uniform valuation that does not require any knowledge of reward gaps or time horizon, and thus partially resolves an open question in matching bandit models.
Dynamic Bipartite Matching Market with Arrivals and Departures
TLDR
It is shown that an algorithm that waits to thicken the market, called the Patient algorithm, is exponentially better than the Greedy algorithm, i.e., an algorithms that matches agents greedily, which means that waiting has substantial benefits on maximizing a matching over a bipartite network.
Integrate Learning and Control in Queueing Systems with Uncertain Payoff
TLDR
The analysis shows that the payoff gap of the proposed algorithm decreases as O(1/V ) + O( √ logN/N), as a parameter V of the algorithm and the average number of tasks per client increase.
Competing Bandits in Matching Markets
TLDR
This work proposes a statistical learning model in which one side of the market does not have a priori knowledge about its preferences for the other side and is required to learn these from stochastic rewards.
...
...

References

SHOWING 1-10 OF 54 REFERENCES
Truthful incentives in crowdsourcing tasks using regret minimization mechanisms
TLDR
This paper designs a novel, no-regret posted price mechanism, BP-UCB, for budgeted procurement in stochastic online settings and proves strong theoretical guarantees about the mechanism, and extensively evaluate it in simulations as well as on real data from the Mechanical Turk platform.
Dynamic matching market design
TLDR
The main insight of the analysis is that waiting to thicken the market can be substantially more important than increasing the speed of transactions, and this is quite robust to the presence of waiting costs.
A dynamic model of barter exchange
TLDR
A platform can achieve the smallest waiting times by using a greedy policy, and by facilitating three cycles, if possible, which is consistent with empirical and computational observations which compare batching policies in the context of kidney exchange programs.
Dynamic Pricing with Limited Supply (extended abstract)
We consider the problem of designing revenue maximizing online posted-price mechanisms when the seller has limited supply. A seller has k identical items for sale and is facing n potential buyers
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
TLDR
The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs.
Learning on a budget: posted price mechanisms for online procurement
TLDR
This work presents a constant-competitive posted price mechanism when agents are identically distributed and the buyer has a symmetric submodular utility function and gives a truthful mechanism that is O(1)-competitive but uses bidding rather than posted pricing.
Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms
TLDR
A single-product revenue management problem where the objective is to dynamically adjust prices over a finite sales horizon to maximize expected revenues, and proposed algorithms develop policies that learn the demand function “on the fly,” and optimize prices based on that.
Dynamic Pricing with Limited Supply
TLDR
This work presents a detail-free online posted-price mechanism whose revenue is at most O((k log n)2/3) less than the offline benchmark, for every distribution that is regular, and proves a matching lower bound.
Optimal Dynamic Assortment Planning with Demand Learning
TLDR
A family of dynamic policies are developed that judiciously balance the aforementioned trade-off between exploration and exploitation, and prove that their performance cannot be improved upon in a precise mathematical sense.
Thickness and Information in Dynamic Matching Markets
TLDR
A model of dynamic matching in networked markets, where agents arrive and depart stochastically and the composition of the trade network depends endogenously on the matching algorithm, and conditions under which local algorithms that choose the right time to match agents are close to optimal.
...
...