- Published 2014

In this supplementary material, we provide details for the following: (1) multi-scale video segmentation and refinement, (2) STR feature, (3) dynamic programming solution for video re-timing, and (4) computational cost of the proposed system. In addition, we include a video for showing the results on the saliency benchmark dataset and the high-speed video dataset. 1. Multi-scale video segmentation and refinement To capture region of interests in different spatial scales, we construct an l-level pyramid of segmentation by adopting different scale parameters that are used in [5]. Each level of the segmentation pyramid corresponds to the result of executing the algorithm [5] with different scale parameters (i.e., the min parameter in the command gbh stream provided by the LIBSVX1 toolbox). In our work, we set l = 4 and min ∈ {100, 1400, 2700, 4000}. The second column of Fig. 1 illustrates an example of segmenting one video in two scales. The initial segmentation results generated by [5] contained many small and redundant spatio-temporal regions (STRs)2 that affects the smoothness of the computed saliency map. Therefore, we further refine the segmentation by merging adjacent STRs with similar colors. To do that, we compute the χ distance between each pair of adjacent STRs. Any pair with a smaller distance than 0.1 is merged to a single new STR. For instance, the second and the third columns of Fig. 1 compare the segmentation results without and with refinement, respectively. http://www.cse.buffalo.edu/ ̃jcorso/r/ supervoxels/ 2The term “voxel” has been used to represent a segmented spatialtemporal region. Since “voxel” is a fundamental term in graphics that has a different meaning, we do not use it to avoid confusion. 2. STR feature For each STR denoted as rc,t, we compute three feature vectors, one color histogram x c,t , and two flow-based descriptors x c,t and x ori c,t . To compute x c,t ∈ Rcol , we quantize the four color channels (L, A, B and hue) into 8, 16, 16 and 4 bins respectively. The dimension of the color histogram is dcol = 8192, where 8192 = 8 · 16 · 16 · 4. To compute x c,t ∈ Rmag , we uniformly quantize the flow magnitude into dmag = 16 bins. To compute x c,t ∈ Rori , we follow the idea of the HoF descriptor [4] by quantizing the flow orientation into 8 bins and use magnitude for weighting. An additional zero bin is added to account for pixels whose optical flow magnitudes are lower than a threshold. The final dimension of the descriptor is dori = 9. 3. Dynamic programming solution for video re-timing In the paper, we formalize the re-timing problem as minimizing the following sum of least-square errors:

@inproceedings{Zhou2014FileF,
title={File for “ Time - mapping Using Space - Time Saliency ”},
author={Feng Zhou and Sing Bing Kang and Michael F. Cohen},
year={2014}
}