Nahum Shimkin

Learn More
We consider a communication network shared by several selfish users. Each user seeks to optimize its own performance by controlling the routing of its given flow demand, giving rise to a noncooperative game. We investigate the Nash equilibrium of such systems. For a two-node multiple links system, uniqueness of the Nash equilibrium is proven under(More)
We present the Q-Cut algorithm, a graph theoretic approach for automatic detection of sub-goals in a dynamic environment, which is used for acceleration of the Q-Learning algorithm. The learning agent creates an on-line map of the process history, and uses an efficient MaxFlow/Min-Cut algorithm for identifying bottlenecks. The policies for reaching(More)
We study a class of noncooperative general topology networks shared by users. Each user has a given flow which it has to ship from a source to a destination. We consider a class of polynomial link cost functions adopted originally in the context of road traffic modeling, and show that these costs have appealing properties that lead to predictable and(More)
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in(More)
We consider a learning problem where the decision maker interacts with a standard Markov decision process, with the exception that the reward functions vary arbitrarily over time. We show that, against every possible realization of the reward process, the agent can perform as well—in hindsight—as every stationary policy. This generalizes the classical(More)
We examine methods for on-line optimization of the basis function for temporal difference Reinforcement Learning algorithms. We concentrate on architectures with a linear parameterization of the value function. Our methods optimize the weights of the network while simultaneously adapting the parameters of the basis functions in order to decrease the Bellman(More)
This paper studies the performance impact of making delay announcements to arriving customers who must wait before starting service in a many-server queue with customer abandonment. The queue is assumed to be invisible to waiting customers, as in most customer contact centers, when contact is made by telephone, email or instant messaging. Customers who must(More)
We consider a wireless collision channel, shared by a finite number of users who transmit to a common base station. Each user wishes to minimize its average transmission rate (or power investment), subject to minimum throughput demand. The channel quality between each user and the base station is randomly time-varying, and partially observed by the user(More)
We consider the problem of reinforcement learning in a controlled Markov environment with multiple objective functions of the long-term average reward type. The environment is initially unknown, and furthermore may be affected by the actions of other agents, actions that are observed but cannot be predicted beforehand. We capture this situation using a(More)