Yishay Mansour

Learn More
Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly represented by its own function approximator, independent of the value(More)
A critical issue for the application of Markov decision processes (MDPs) to realistic problems is how the complexity of planning scales with the size of the MDP. In stochastic environments with very large or infinite state spaces, traditional planning and reinforcement learning algorithms may be inapplicable, since their running time typically grows(More)
In this paper, Boolean functions in ,4C0 are studied using harmonic analysis on the cube. The main result is that an ACO Boolean function has almost all of its “power spectrum” on the low-order coefficients. An important ingredient of the proof is Hastad’s switching lemma [8]. This result implies several new properties of functions in -4C[’: Functions in(More)
Multi-agent games are becoming an increasingly prevalent formalism for the study of electronic commerce and auctions. The speed at which transactions can take place and the growing complexity of electronic marketplaces makes the study of computationally simple agents an appealing direction. In this work, we analyze the behavior of agents that incrementally(More)
We analyze the performance of top-down algorithms for decision tree learning, such as those employed by the widely used C4.5 and CART software packages. Our main result is a proof that such algorithms are boosling algorithms. By this we mean that if the functions that label the internal nodes of the decision tree can weakly approximate the unknown target(More)
The bandit problem is revisited and considered under the PAC model. Our main contribution in this part is to show that given n arms, it suffices to pull the arms O( n 2 log 1 δ ) times to find an -optimal arm with probability of at least 1 − δ. This is in contrast to the naive bound of O( n 2 log n δ ). We derive another algorithm whose complexity depends(More)
We suggest a scheme for a block cipher which uses only one randomly chosen permutation,F. The key, consisting of two blocks,K 1 andK 2, is used in the following way. The message block is XORed withK 1 before applyingF, and the outcome is XORed withK 2, to produce the cryptogram block. We show that the resulting cipher is secure (when the permutation is(More)
We consider two types of buffering policies that are used in network switches supporting QoS (Quality of Service). In the <italic>FIFO</italic> type, packets must be released in the order they arrive; the difficulty in this case is the limited buffer space. In the <italic>bounded-delay</italic> type, each packet has a maximum delay time by which it must be(More)