An Investigation of Cross-Language Information Retrieval for User-Generated Internet Video

  title={An Investigation of Cross-Language Information Retrieval for User-Generated Internet Video},
  author={Ahmad Khwileh and Debasis Ganguly and G. Jones},
  booktitle={Conference and Labs of the Evaluation Forum},
Increasing amounts of user-generated video content are being uploaded to online repositories. This content is often very uneven in quality and topical coverage in different languages. The lack of material in individual languages means that cross-language information retrieval CLIR within these collections is required to satisfy the user's information need. Search over this content is dependent on available metadata, which includes user-generated annotations and often noisy transcripts of spoken… 

Investigating segment-based query expansion for user-generated spoken content retrieval

  • Ahmad KhwilehG. Jones
  • Computer Science
    2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)
  • 2016
This work introduces three speech segment-based methods for QE using: Semantic segmentation, Discourse segmentation and Window-Based using a version of the MediaEval 2012 Search task newly extended as an adhoc search task.

Towards effective cross-lingual search of user-generated internet speech

This thesis proposes novel methods to estimate the quality of translation for cross-lingual UGS search and presents a novel framework specifically designed for predicting of the effectiveness of QE.

Identifying Effective Translations for Cross-lingual Arabic-to-English User-generated Speech Search

The potential for improving CLIR effectiveness is examined by predicting the translation effectiveness using Query Performance Prediction (QPP) techniques and a novel QPP method is proposed to estimate the quality of translation for an Arabic-English Cross-lingual User-generated Speech Search (CLUGS) task.

What Happened in CLEF \ldots For a While?

A summary of the motivations which led to the establishment of CLEF is provided, and a description of how it has evolved over the years, the major achievements, and what the next challenges are are described.

What Happened in CLEF. . . For a While?

2019 marks the 20 birthday for CLEF, an evaluation campaign activity which has applied the Cranfield evaluation paradigm to the testing of multilingual and multimodal information access systems in



Exploring speech retrieval from meetings using the AMI corpus

Cross-Language Pseudo-Relevance Feedback Techniques for Informal Text

Experimental results show that this approach can significantly outperform state-of-the-art results reported for monolingual and cross-lingual environments and indicates that inter-language PRF is particularly helpful for queries with poor translation quality.

Overview of VideoCLEF 2009: New Perspectives on Speech-based Multimedia Content Enrichment

VideoCLEF 2009 offered three tasks related to enriching video content for improved multimedia access in a multilingual environment, involving automatic tagging of videos with subject theme labels and linking video to material on the same subject in a different language.

University of Glasgow at WebCLEF 2005: Experiments in per-field Normalisation and Language Specific Stemming

A language specific technique for applying the correct stemming approach, as well as for removing the correct stopwords from the queries, is developed for retrieving relevant documents from a multilingual corpus of Web documents from Web sites of European governments.

Overview of the CLEF-2005 Cross-Language Speech Retrieval Track

The task for the CLEF-2005 cross-language speech retrieval track was to identify topically coherent segments of English interviews in a known-boundary condition, and results indicate that monolingual search technology is sufficiently accurate to be useful for some purposes.

Probabilistic models of information retrieval based on measuring the divergence from randomness

A framework for deriving probabilistic models of Information Retrieval using term-weighting models obtained in the language model approach by measuring the divergence of the actual term distribution from that obtained under a random process is introduced.

CLEF 2004 Cross-Language Spoken Document Retrieval Track

Results from the participants showing that as expected cross-language results are reduced relative to a monolingual baseline, although the amount to which they are degraded varies for different topic languages.

TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics

The TREC Video Retrieval Evaluation (TRECVID) 2011 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital

Search and Hyperlinking Task at MediaEval 2012

The Search and Hyperlinking Task was one of the Brave New Tasks at MediaEval 2012. The Task consisted of two subtasks which focused on search and linking in retrieval from a collection of

Arabic machine translation: a survey

This paper summarizes the major techniques used in machine translation from Arabic into English, and discusses their strengths and weaknesses.