Learn More
Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly represented by its own function approximator, independent of the value(More)
A critical issue for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments with very large or infinite state spaces, traditional planning and reinforcement learning algorithms may be inapplicable, since their running time typically grows(More)
In this paper, Boolean functions in ,4C0 are studied using harmonic analysis on the cube. The main result is that an ACO Boolean function has almost all of its " power spectrum " on the low-order coefficients. An important ingredient of the proof is Hastad's switching lemma [8]. This result implies several new properties of functions in-4C[': Functions in(More)
The bandit problem is revisited and considered under the PAC model. Our main contribution in this part is to show that given n arms, it suffices to pull the arms O(n 2 log 1 δ) times to find an-optimal arm with probability of at least 1 − δ. This is in contrast to the naive bound of O(n 2 log n δ). We derive another algorithm whose complexity depends on the(More)
This work gives a polynomial time algorithm for learning decision trees with respect to the uniform distribution. (This algorithm uses membership queries.) The decision tree model that is considered is an extension of the traditional boolean decision tree model that allows linear operations in each node (i.e., summation of a subset of the input variables(More)
Multi-agent games are becoming an increasingly prevalent formalism for the study of electronic commerce and auctions. The speed at which transactions can take place and the growing complexity of electronic marketplaces makes the study of computationally simple agents an appealing direction. In this work, we analyze the behavior of agents that incrementally(More)
We present new results on the well-studied problem of learning DNF expressions. We prove that an algorithm due to Kushilevitz and Mansour [13] can be used to weakly learn DNF formulas with membership queries with respect to the uniform distribution. This is the first positive result known for learning general DNF in polynomial time in a nontrivial model.(More)
This paper addresses the general problem of domain adaptation which arises in a variety of applications where the distribution of the labeled sample available somewhat differs from that of the test data. Building on previous work by Ben-David et al. (2007), we introduce a novel distance between distributions, discrepancy distance, that is tailored to(More)
1 Introduction and History We introduce and investigate a new model of learning probability distributions from independent draws. Our model is inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled examples 24], in the sense that we emphasize eecient and approximate learning, and we study the(More)