Recent work has defined an optimal reward problem (ORP) in which an agent designer, with an objective reward function that evaluates an agent's behavior, has a choice of what reward function to build into a learning or planning agent to guide its behavior. Existing results on ORP show weak miti-gation of limited computational resources, i.e., the existence(More)
