Real-Time Human Tracking through Single-Channel Video Feeds


In order to effectively process, locate, and track human forms real-time in a video feed requires not only a strong hardware base but adaptive and intelligent software solutions. Likewise, in order to locate objects in 3d space with a single channel video feed requires additionally intelligent software. This project strives to develop a system that performs all of these things via attention-specific systems, camera calibration techniques, and facialbounding-box detection algorithms. Introduction – Real-time video processing and automated surveillance has been of large interest in many fields. It is most often thought of us practical for intelligence and security, but it has many more potential applications. This document attempts to develop a practical and inexpensive solution for a somewhat “interactive” video monitoring system. This will allow, in particular, performing arts to be able to track and location of a performer on stage and program an automated response. An example of this would be a light following a dancer on stage. Traditional methods require either personnel to manually operate a spotlight – which has its own set of limitations, or for the dancer to move to pre-recorded movements. This project attempts to create a solution which will be able to follow the dancer in three dimensions, allowing for error or improvisation in the case of pre-recorded movement. This also hold as a benefit over manual lighting control, as it can operate any number of systems simultaneously and account for depth, which most manual systems lack. This document will outline the anticipated methods of research and hypothesis for the general detection of human forms. Objectives – The objective of this project is to design and implement a system which tracks human forms on a stage. It also will be designed to be able to provide a user with the 3-dimensional coordinates of any given dancer on the stage. A major proponent of this design is to be able to interface this data with additional applications that will use the geometric coordinates of the detected people to produce artistic effects. This will require an external api to be implemented. Hazards – Several hazards exist in this project. The first real hazard is that of camera data rates. A typical USB camera has a small lag between the time the camera’s sensor detects the image and the transmission to the PC is complete. This lag can be potentially very hazardous if the performer is moving quickly. Research will have to be done prior to implementation to determine whether or not this lag will be acceptable, as when the camera is far away from the object the small delay will have a less pronounced effect. This research will either conclude that the USB camera is acceptable, or a faster camera will have to be found. The goal of this project is to produce a result that is not only simple but very cost effective, thus an Ethernet or Firewire based camera will be acceptable, but not ideal. The second hazard is accuracy. At this point in the design phase, and going along with the project goal of being both cheap and simple, one camera will be ideal. This presents a very technical challenge in accuracy. It will require additional calibration of the camera in order to accurately determine their position along the z-axis. While this will work fairly cleanly in theory, it may be found that in reality the accuracy just isn’t there. At that point a design decision will have to be made regarding the utilization of an additional camera to produce two reference frames. A final hazard is resolution and frame rate of the camera, as this will determine hardware requirements. For a system to be fast enough to track a performer reliably, estimated frame rates of 15 – 30 FPS will be needed. If the camera cannot this frame rate at the full resolution, lesser resolutions will need to be selected at the price of accuracy. As stated in the hardware section, potential lossy frame rates or low resolution will require that additional hardware will need to be used. Hardware Components – The hardware design required for this project is largely dependent on the specifications of the camera used. If it tends to have low frame rates or resolution, a converter will have to be implemented or purchased in order to allow a generic composite signal from a video camera to be recognized as a webcam by the PC. This will allow for much higher frame rates and resolution, and thus much higher accuracy. A second component necessary will be an RF link between the video capture device and the PC. The preferred protocol will either be Wireless USB or Bluetooth, as both are fairly cheap to implement, range

2 Figures and Tables

Cite this paper

@inproceedings{Martin2009RealTimeHT, title={Real-Time Human Tracking through Single-Channel Video Feeds}, author={Benjamin Martin}, year={2009} }