Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video
This document summarizes the technology, procedures, and database organization of the CMU Multi-Modal Activity Database (CMU-MMAC). The CMU-MMAC database contains multimodal measures of the human activity of subjects performing the tasks involved in cooking and food preparation. The CMU-MMAC database was collected in Carnegie Mellon University’s Motion Capture Lab. A kitchen was built and to date five subjects have been recorded cooking five different recipes: brownies, pizza, sandwich, salad and scrambled eggs. The following modalities were recorded: • Video: (1) Three high spatial resolution (1024 × 768) color video cameras at low temporal resolution (30 Hertz). (2) Two low spatial resolution (640 × 480) color video cameras at high temporal resolution (60 Hertz). (3) One wearable low spatial resolution (640×480) camera at low temporal resolution (12 Hertz). • Audio: (1) Five balanced microphones. (2) Wearable watch. • Motion capture: A Vicon motion capture system with 12 infrared MX-40 cameras. Each camera records images of 4 megapixel resolution at 120 Hertz. • Five 3-axis accelerometers and gyroscopes. Several computers were used for recording the various modalities. The computers were synchronized using the Network Time Protocol (NTP).