Action Recognition with Improved Trajectories

Abstract

Recently dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets. This paper improves their performance by taking into account camera motion to correct them. To estimate camera motion, we match feature points between frames using SURF descriptors and dense optical flow, which are shown to be complementary. These matches are, then, used to robustly estimate a homography with RANSAC. Human motion is in general different from camera motion and generates inconsistent matches. To improve the estimation, a human detector is employed to remove these matches. Given the estimated camera motion, we remove trajectories consistent with it. We also use this estimation to cancel out camera motion from the optical flow. This significantly improves motion-based descriptors, such as HOF and MBH. Experimental results on four challenging action datasets (i.e., Hollywood2, HMDB51, Olympic Sports and UCF50) significantly outperform the current state of the art.

DOI: 10.1109/ICCV.2013.441

Extracted Key Phrases

Showing 1-10 of 43 references
Showing 1-10 of 678 extracted citations
020040020132014201520162017
Citations per Year

1,002 Citations

Semantic Scholar estimates that this publication has received between 895 and 1,126 citations based on the available data.

See our FAQ for additional information.