Online Second Price Auction with Semi-bandit Feedback Under the Non-Stationary Setting

@inproceedings{Zhao2020OnlineSP,
  title={Online Second Price Auction with Semi-bandit Feedback Under the Non-Stationary Setting},
  author={Haoyu Zhao and Wei Chen},
  booktitle={AAAI},
  year={2020}
}
In this paper, we study the non-stationary online second price auction problem. We assume that the seller is selling the same type of items in T rounds by the second price auction, and she can set the reserve price in each round. In each round, the bidders draw their private values from a joint distribution unknown to the seller. Then, the seller announced the reserve price in this round. Next, bidders with private values higher than the announced reserve price in that round will report their… 

Setting Reserve Prices in Second-Price Auctions with Unobserved Bids

TLDR
The proposed solution approach is applicable to any seller who sells an item via second-price auctions and wants to optimize its reserve price during these auctions and outperforms state-of-the-art bandit algorithms designed for nonstationary environments.

Learning to Bid Optimally and Efficiently in Adversarial First-price Auctions

TLDR
This paper develops the first minimax optimal online bidding algorithm that achieves an $\widetilde{O}(\sqrt{T})$ regret when competing with the set of all Lipschitz bidding policies, a strong oracle that contains a rich set of bidding strategies.

Online Causal Inference for Advertising in Real-Time Bidding Auctions

Real-time bidding (RTB) systems, which utilize auctions to allocate user impressions to competing advertisers, continue to enjoy success in digital advertising. Assessing the effectiveness of such

Combinatorial Semi-Bandit in the Non-Stationary Environment

TLDR
A parameter-free algorithm is designed that achieves nearly optimal regret both in the switching case and in the dynamic case without knowing the parameters in advance.

Optimal Tracking in Prediction with Expert Advice

TLDR
These algorithms are the first to produce such universally optimal, adaptive and truly online guarantees with no prior knowledge for the prediction with expert advice problem, within the framework of 5G and Beyond Joint.

Optimal No-regret Learning in Repeated First-price Auctions

TLDR
By exploiting the structural properties of first-price auctions, this paper develops the first learning algorithm that achieves a regret bound when the bidder's private values are stochastically generated, and establishes an $O(\sqrt{T}\log^3 T)$ regret bound for this algorithm, hence providing a complete characterization of optimal learning guarantees for this problem.

References

SHOWING 1-10 OF 25 REFERENCES

Regret Minimization for Reserve Prices in Second-Price Auctions

TLDR
A regret minimization algorithm for setting the reserve price in a sequence of second-price auctions, under the assumption that all bids are independently drawn from the same unknown and arbitrary distribution, achieves a regret of Õ(√T) in asequence of T auctions.

Stochastic One-Sided Full-Information Bandit

In this paper, we study the stochastic version of the one-sided full information bandit problem, where we have $K$ arms $[K] = \{1, 2, \ldots, K\}$, and playing arm $i$ would gain reward from an

Minimizing Regret with Multiple Reserves

TLDR
The hardness result for the MMR problem implies that computationally efficient online learning requires approximation, even in the special case of single-item auction environments.

Revenue Optimization against Strategic Buyers

TLDR
The notion of e-strategic buyer is introduced, a more natural notion of strategic behavior than what has been considered in the past, and an optimal regret bound is achieved when the seller selects prices from a finite set.

Optimal Auction Design

TLDR
Optimal auctions are derived for a wide class of auction design problems when the seller has imperfect information about how much the buyers might be willing to pay for the object.

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

TLDR
The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs.

Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards

TLDR
This paper fully characterize the (regret) complexity of this class of MAB problems by establishing a direct link between the extent of allowable reward "variation" and the minimal achievable regret, and by established a connection between the adversarial and the stochastic MAB frameworks.

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning

TLDR
This work designs the first explicit algorithm achieving the minimax regret rate (up to log factors) and obtains algorithms for Lipschitz and semi-Lipschitzer losses with regret bounds improving on the known bounds for standard bandit feedback.

Finite-time Analysis of the Multiarmed Bandit Problem

TLDR
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

Tracking the Best Expert in Non-stationary Stochastic Environments

TLDR
A new parameter $\Lambda$ is introduced, which measures the total statistical variance of the loss distributions over $T$ rounds of the process, and how this amount affects the regret, and proposes algorithms with upper bound guarantee, and proves their matching lower bounds.