MMToC: A Multimodal Method for Table of Content Creation in Educational Videos


In this paper we propose a multimodal method called MMToC for automatically creating a table of content for educational videos. MMToC defines and quantifies word saliency for visual words extracted from the slides and spoken words obtained from the speech transcript. The saliency scores from these two modalities are combined to obtain a ranked list of salient words. These ranked words along with their saliency scores are used to formulate a topic segmentation cost function. The cost function is optimized using a dynamic program framework to obtain the topic segments of the video. These segments are labelled with their corresponding topic names for creating the table of content. We perform experiments on 24 hours of lectures spread across 23 videos ranging over 20-75 minutes duration each. We compare the proposed method with LDA-based video segmentation approaches and show that the proposed MMToC method is significantly better (F-score improvement of 0.19 and 0.24 on two datasets). We also perform a user study to demonstrate the effectiveness of MMToC for navigating educational videos.

DOI: 10.1145/2733373.2806253

Extracted Key Phrases

5 Figures and Tables

Cite this paper

@inproceedings{Biswas2015MMToCAM, title={MMToC: A Multimodal Method for Table of Content Creation in Educational Videos}, author={Arijit Biswas and Ankit Gandhi and Om Deshmukh}, booktitle={ACM Multimedia}, year={2015} }