Yujia Huo

Learn More
In this paper we summarize our experiments in the Image-CLEF 2015 Scalable Concept Image Annotation challenge. The RUC-Tencent team participated in all subtasks: concept detection and local-ization, and image sentence generation. For concept detection, we experiments with automated approaches to gather high-quality training examples from the Web, in(More)
This paper summarizes our efforts for the first time participation in the Violent Scene Detection subtask of the MediaEval 2015 Affective Impact of Movies Task. We build violent scene detectors using both audio and visual cues. In particular , the audio cue is represented by bag-of-audio-words with fisher vector encoding. The visual cue is exploited by(More)
This paper describes our solution for the MSR Video to Language Challenge. We start from the popular ConvNet + LSTM model, which we extend with two novel modules. One is early embedding, which enriches the current low-level input to LSTM by tag embeddings. The other is late reranking, for re-scoring generated sentences in terms of their relevance to a(More)
This paper attacks the challenging problem of violence detection in videos. Different from existing works focusing on combining multi-modal features, we go one step further by adding and exploiting subclasses visually related to violence. We enrich the MediaEval 2015 violence dataset by manually labeling violence videos with respect to the subclasses. Such(More)
This abstract paper sketches our research towards Structured Semantic Embedding of multimedia data. Though a tag may have multiple senses with completely different visual imagery, current semantic embedding methods represent the tag by a single vector regardless of its senses. We challenge this convention, arguing the importance of adding semantic(More)
  • 1