Q-Learning for Reward Machines is presented, an algorithm which appropriately decomposes the reward machine and uses off-policy q-learning to simultaneously learn subpolicies for the different components and is guaranteed to converge to an optimal policy in the tabular case.Expand

This paper uses Linear Temporal Logic as a language for specifying multiple tasks in a manner that supports the composition of learned skills and proposes a novel algorithm that exploits LTL progression and off-policy RL to speed up learning without compromising convergence guarantees.Expand

A non-trivial lower bound on the number of derived characters necessary for k-wise independence with the tabulation-based hash classes presented and a variant in which d derived characters a+b·i, for i=0,…, d−1 are shown to yield (2d−1)-wise independence.Expand

It is shown that RMs can be learned from experience, instead of being specified by the user, and that the resulting problem decomposition can be used to effectively solve partially observable RL problems.Expand

This work proposes using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions, to ease the burden of complex reward function specification.Expand

Experimental results on deterministic grid worlds demonstrate the potential for good advice to reduce the amount of exploration required to learn a satisficing or optimal policy, while maintaining robustness in the face of incomplete or misleading advice.Expand

A new epistemic logic is introduced, based on a three-valued version of neighborhood semantics, which allows for talking about the effort used in making inferences and is suggested that the ideas used in it could also find a role in autoepistemic reasoning.Expand

This paper proposes reward machines (RMs), a type of finite state machine that supports the specification of reward functions while exposing reward function structure, and describes different methodologies to exploit such structures, including automated reward shaping, task decomposition, and counterfactual reasoning for data augmentation.Expand

This work formalizes the notion of Epistemic Plan Recognition, which builds on two growing areas of research: epistemic planning and plan recognition, and casts the epistemic plan recognition problem as an epistemic Planning problem, whose solutions can be generated using existing epistemic planner tools.Expand

This paper provides an account of explanation in terms of the beliefs of agents and the mechanism by which agents revise their beliefs given possible explanations and identifies a set of desiderata for explanations that utilize Theory of Mind.Expand