Makarand Tapaswi

Learn More
We address the problem of person identification in TV series. We propose a unified learning framework for multi-class classification which incorporates labeled and unlabeled data, and constraints between pairs of features in the training. We apply the framework to train multinomial logistic regression classifiers for multi-class face recognition. The method(More)
We introduce the MovieQA dataset which aims to evaluate automatic story comprehension from both video and text. The dataset consists of 14,944 questions about 408 movies with high semantic diversity. The questions range from simpler "Who" did "What" to "Whom", to "Why" and "How" certain events occurred. Each question comes with a set of five possible(More)
We describe a probabilistic method for identifying characters in TV series or movies. We aim at labeling every character appearance, and not only those where a face can be detected. Consequently, our basic unit of appearance is a person track (as opposed to a face track). We model each TV series episode as a Markov Random Field, integrating face(More)
The goal of this paper is unsupervised face clustering in edited video material – where face tracks arising from different people are assigned to separate clusters, with one cluster for each person. In particular we explore the extent to which faces can be clustered automatically without making an error. This is a very challenging problem given the(More)
Collecting training images for all visual categories is not only expensive but also impractical. Zero-shot learning (ZSL), especially using attributes, offers a pragmatic solution to this problem. However, at test time most attribute-based methods require a full description of attribute associations for each unseen class. Providing these associations is(More)
This paper presents the results of our content–based video genre classification system on the 2012 MediaEval Tagging Task. Our system utilizes several low–level visual cues to achieve this task. The purpose of this evaluation is to assess our content–based system’s performance on the large amount of blip.tv web–videos and high number of genres. The task and(More)
We propose a method to facilitate search through the storyline of TV series episodes. To this end, we use human written, crowdsourced descriptions—plot synopses—of the story conveyed in the video. We obtain such synopses from websites such as Wikipedia and propose various methods to align each sentence of the plot to shots in the video. Thus, the semantic(More)
The Repere challenge is a project aiming at the evaluation of systems for supervised and unsupervised multimodal recognition of people in TV broadcast. In this paper, we describe, evaluate and discuss QCompere consortium submissions to the 2012 Repere evaluation campaign dry-run. Speaker identification (and face recognition) can be greatly improved when(More)
Video face recognition is a very popular task and has come a long way. The primary challenges such as illumination, resolution and pose are well studied through multiple data sets. However there are no video-based data sets dedicated to study the effects of aging on facial appearance. We present a challenging face track data set, Harry Potter Movies Aging(More)
Film adaptations of novels often visually display in a few shots what is described in many pages of the source novel. In this paper we present a new problem: to align book chapters with video scenes. Such an alignment facilitates finding differences between the adaptation and the original source, and also acts as a basis for deriving rich descriptions from(More)