• Publications
  • Influence
Finite-time analysis of the multi-armed bandit problem with known trend
  • Djallel Bouneffouf
  • Computer Science
    IEEE Congress on Evolutionary Computation (CEC)
  • 1 July 2016
TLDR
By adapting the standard multi-armed bandit algorithms, this work proposes to study the regret upper bounds of three algorithms: the two first one assumes a stochastic model; and the last one is based on a Bayesian approach.
Teaching AI Agents Ethical Values Using Reinforcement Learning and Policy Orchestration
TLDR
A novel approach that uses inverse reinforcementLearning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards, and a contextual bandit-based orchestrator that allows the agent to mix policies in novel ways, taking the best actions from either a reward-maximizing or constrained policy.
Following the User's Interests in Mobile Context-Aware Recommender Systems: The Hybrid-e-greedy Algorithm
TLDR
This paper describes an ongoing work on the implementation of a MCRS based on the hybrid-å-greedy algorithm, which combines the standard å- greedy algorithm and both content-based filtering and case-based reasoning techniques.
A Neural Networks Committee for the Contextual Bandit Problem
TLDR
A new contextual bandit algorithm, NeuralBandit, which does not need hypothesis on stationarity of contexts and rewards is presented, and two variants, based on multi-experts approach, are proposed to choose online the parameters of multi-layer perceptrons.
An ADMM Based Framework for AutoML Pipeline Configuration
TLDR
A novel AutoML scheme is proposed by leveraging the alternating direction method of multipliers (ADMM) to decompose the optimization problem into easier sub-problems that have a reduced number of variables and circumvent the challenge of mixed variable categories.
Beyond Backprop: Online Alternating Minimization with Auxiliary Variables
TLDR
This work presents a novel online (stochastic/mini-batch) alternating minimization (AM) approach for training deep neural networks, together with the first theoretical convergence guarantees for AM in stochastic settings and promising empirical results on a variety of architectures and datasets.
Sampling with Minimum Sum of Squared Similarities for Nystrom-Based Large Scale Spectral Clustering
TLDR
A scalable Nystrom-based clustering algorithm with a new sampling procedure, Minimum Sum of Squared Similarities (MSSS), is proposed and a theoretical analysis of the upper error bound of the algorithm is provided.
Beyond Backprop: Alternating Minimization with co-Activation Memory
TLDR
A novel online algorithm for training deep feedforward neural networks that employs alternating minimization (block-coordinate descent) between the weights and activation variables and improves over stochastic gradient descent (SGD) with backpropagation in several ways.
A Contextual-Bandit Algorithm for Mobile Context-Aware Recommender System
TLDR
This paper introduces an algorithm based on dynamic exploration/exploitation and can adaptively balance the two aspects by deciding which user's situation is most relevant for exploration or exploitation.
...
1
2
3
4
5
...