In this paper a novel Q-learning algorithm is proposed to solve the Linear Quadratic Output Tracking (LQOT) control problem of a linear time invariant system with completely unknown system and reference dynamics. We first define an action-dependent value function for the LQOT problem after we augment the system and the reference states and pick… (More)