This paper introduces a novel approach towards direct interaction with large display systems. Monocular computer vision is utilised to avoid restraints imposed by input devices. Tracking the user’s head and determining the view frustum in real-time is one of the key processes in our proposed human-computer interaction system. We also proposed using a view frustum to model the user’s interaction volume allowing flexible interaction with the display. Finally, we demonstrate the feasibility of this new concept and provide an accuracy analysis of our prototype system. .