Xuanchong Li

Learn More
Most state-of-the-art action feature extractors involve differential operators, which act as highpass filters and tend to attenuate low frequency action information. This attenuation introduces bias to the resulting features and generates ill-conditioned feature matrices. The Gaussian Pyramid has been used as a feature enhancing technique that encodes(More)
In multi-person tracking scenarios, gaining access to the identity of each tracked individual is crucial for many applications such as long-term surveillance video analysis. Therefore, we propose a long-term multi-person tracker which utilizes face recognition information to not only enhance tracking performance, but also assign identities to tracked(More)
Recent improvements in content-based video search have led to systems with promising accuracy, thus opening up the possibility for interactive content-based video search to the general public. We present an interactive system based on a state-of-the-art content-based video search pipeline which enables users to do multimodal text-to-video and video-to-video(More)
It is common that users are interested in finding video segments, which contain further information about the video contents in a segment of interest. To facilitate users to find and browse related video contents, video hyperlinking aims at constructing links among video segments with relevant information in a large video collection. In this study, we(More)
Historically, researchers in the field have spent a great deal of effort to create image representations that have scale invariance and retain spatial location information. This paper proposes to encode equivalent temporal characteristics in video representations for action recognition. To achieve temporal scale invariance, we develop a method called(More)
We report on our system used in the TRECVID 2013 Multimedia Event Detection (MED) and Multimedia Event Recounting (MER) tasks. For MED, it consists of four main steps: extracting features, representing features, training detectors and fusion. In the feature extraction part, we extract more than 10 low-level, high-level, and text features. Those features are(More)
Massive Open Online Courses (MOOCs) enable everyone to receive high-quality education. However, current MOOC creators cannot provide an effective, economical, and scalable method to detect cheating on tests, which would be required for any certification. In this paper, we propose a Massive Open Online Proctoring (MOOP) framework, which combines both(More)
We report on our system used in the TRECVID 2014 Semantic Indexing (SIN) task. We highlight the following new components: 1) self-paced learning pipeline for concept training, 2) dense trajectory with fisher vector encoding, 3) multi-modal pseudo relevance feedback for final results reranking and 4) deep convolutional neural networks directly trained on SIN(More)
We propose a method for representing motion information for video classification and retrieval. We improve upon local descriptor based methods that have been among the most popular and successful models for representing videos. The desired local descriptors need to satisfy two requirements: 1) to be representative, 2) to be discriminative. Therefore, they(More)
We report on our system used in the TRECVID 2014 Multimedia Event Detection (MED) and Multimedia Event Recounting (MER) tasks. On the MED task, the CMU team achieved leading performance in the Semantic Query (SQ), 000Ex, 010Ex and 100Ex settings. Furthermore, SQ and 000Ex runs are significantly better than the submissions from the other teams. We attribute(More)