From pose to activity: Surveying datasets and introducing CONVERSE
A good data corpus lies at the heart of progress in both perceptual/cognitive science and in computer vision. While there are a few datasets that deal with simple actions, creating a realistic corpus for complex, long action sequences that contains also human-human interactions has so far not been attempted to our knowledge. Here, we introduce such a corpus for (inter)action understanding that contains six everyday scenarios taking place in a kitchen / living-room setting. Each scenario was acted out several times by different pairs of actors and contains simple object interactions as well as spoken dialogue. In addition, each scenario was first recorded with several HD cameras and also with motion-capturing of the actors and several key objects. Having access to the motion capture data allows not only for kinematic analyses, but also allows for the production of realistic animations where all aspects of the scenario can be fully controlled. We also present results from a first series of perceptual experiments that show how humans are able to infer scenario classes, as well as individual actions and objects from computer animations of everyday situations. These results can serve as a benchmark for future computational approaches that begin to take on complex action understanding.