Learn More
Managing risk in dynamic decision problems is of cardinal importance in many fields such as finance and process control. The most common approach to defining risk is through various variance related criteria such as the Sharpe Ratio or the standard deviation adjusted reward. It is known that optimizing many of the variance related risk criteria is NP-hard.(More)
When analyzing data that originated from a dynamical system, a common practice is to encompass the problem in the well known frameworks of Markov Decision Processes (MDPs) and Reinforcement Learning (RL). The state space in these solutions is usually chosen in some heuristic fashion and the formed MDP can then be used to simulate and predict data, as well(More)
In this paper we extend temporal difference policy evaluation algorithms to performance criteria that include the variance of the cumulative reward. Such criteria are useful for risk management, and are important in domains such as finance and process control. We propose variants of both TD(0) and LSTD(λ) with linear function approximation , prove their(More)
In this paper we extend temporal difference policy evaluation algorithms to performance criteria that include the variance of the cumulative reward. Such criteria are useful for risk management, and are important in domains such as finance and process control. We propose both TD(0) and LSTD(λ) variants with linear function approximation, prove their(More)
Actor-critic algorithms for reinforcement learning are achieving renewed popularity due to their good convergence properties in situations where other approaches often fail (e.g., when function approximation is involved). Interestingly, there is growing evidence that actor-critic approaches based on phasic dopamine signals play a key role in biological(More)
Learning in multilayer neural networks (MNNs) relies on continuous updating of large matrices of synaptic weights by local rules. Such locality can be exploited for massive parallelism when implementing MNNs in hardware. However, these update rules require a multiply and accumulate operation for each synaptic weight, which is challenging to implement(More)
With email traffic increasing, leading Web mail services have started to offer features that assist users in reading and processing their inboxes. One approach is to identify "important" messages, while a complementary one is to bundle messages, especially machine-generated ones, in pre-defined categories. We rather propose here to go back to the task at(More)
In reinforcement learning an agent uses on-line feedback from the environment and prior knowledge in order to adaptively select an effective policy. Model free approaches address this task by directly mapping external and internal states to actions, while model based methods attempt to construct a model of the environment, followed by a selection of optimal(More)