Learn More
This paper introduces Learn Structure and Exploit RMax (LSE-RMax), a novel model based structure learning algorithm for er-godic factored-state MDPs. Given a planning horizon that satisfies a condition, LSE-RMax provably guarantees a return very close to the optimal return, with a high certainty, without requiring any prior knowledge of the in-degree of the(More)
In the Trading Agent Competition Ad Auctions Game, agents compete to sell products by bidding to have their ads shown in a search engine's sponsored search results. We report on the winning agent from the first (2009) competition, TacTex. TacTex operates by estimating the full game state from limited information, using these estimates to make predictions,(More)
The traditional agenda in Multiagent Learning (MAL) has been to develop learners that guarantee convergence to an equilibrium in self-play or that converge to playing the best response against an opponent using one of a fixed set of known targeted strategies. This paper introduces an algorithm called Learn or Exploit for Adversary Induced Markov Decision(More)
version that appeared in the official ICML proceedings. The only substantive change is due to the fact that, based on subsequent discussions with peers, we identified a technical flaw in the ways that our MLeS and CMLeS algorithms were guaranteeing safety. Specifically, it was possible that MLeS (and also CM-LeS) may converge to modeling an arbitrary(More)
Knowledge transfer between expert and novice agents is a challenging problem given that the knowledge representation and learning algorithms used by the novice learner can be fundamentally different from and inaccessible to the expert trainer. We are particularly interested in team tasks, robotic or otherwise, where new teammates need to replace currently(More)
In recent years, great strides have been made towards creating autonomous agents that can learn via interaction with their environment. When considering just an individual agent, it is often appropriate to model the world as being stationary, meaning that the same action from the same state will always yield the same (possibly stochastic) effects. However,(More)
In the modern era of electronics and communication decoding and encoding of any data(s) using VLSI technology requires low power, less area and high speed constrains. The viterbi decoder using survivor path with necessary parameters for wireless communication is an attempt to reduce the power and cost and at the same time increase the speed compared to(More)
The problem of decentralized control occurs frequently in realistic domains where agents have to cooperate to achieve a universal goal. Planning for domain-level joint strategy takes into account the uncertainty of the underlying environment in computing near-optimal joint-strategies that can handle the intrinsic domain uncertainty. However, uncertainty(More)