An Investigation of Cross-Language Information Retrieval for User-Generated Internet Video

Increasing amounts of user-generated video content are being uploaded to online repositories. This content is often very uneven in quality and topical coverage in different languages. The lack of material in individual languages means that cross-language information retrieval CLIR within these collections is required to satisfy the user's information need. Search over this content is dependent on available metadata, which includes user-generated annotations and often noisy transcripts of spoken… 
Investigating segment-based query expansion for user-generated spoken content retrieval
This work introduces three speech segment-based methods for QE using: Semantic segmentation, Discourse segmentation and Window-Based using a version of the MediaEval 2012 Search task newly extended as an adhoc search task.
Towards effective cross-lingual search of user-generated internet speech
This thesis proposes novel methods to estimate the quality of translation for cross-lingual UGS search and presents a novel framework specifically designed for predicting of the effectiveness of QE.
Identifying Effective Translations for Cross-lingual Arabic-to-English User-generated Speech Search
The potential for improving CLIR effectiveness by predicting the translation effectiveness using Query Performance Prediction (QPP) techniques is examined and a novel QPP method to estimate the quality of translation for an Arabic-Engish Cross-lingual Speech Search (CLUGS) task is proposed.
What Happened in CLEF \ldots For a While?
A summary of the motivations which led to the establishment of CLEF is provided, and a description of how it has evolved over the years, the major achievements, and what the next challenges are are described.
2019 marks the 20 birthday for CLEF, an evaluation campaign activity which has applied the Cranfield evaluation paradigm to the testing of multilingual and multimodal information access systems in


Exploring speech retrieval from meetings using the AMI corpus
Cross-Language Pseudo-Relevance Feedback Techniques for Informal Text
Experimental results show that this approach can significantly outperform state-of-the-art results reported for monolingual and cross-lingual environments and indicates that inter-language PRF is particularly helpful for queries with poor translation quality.
Overview of VideoCLEF 2009: New Perspectives on Speech-based Multimedia Content Enrichment
VideoCLEF 2009 offered three tasks related to enriching video content for improved multimedia access in a multilingual environment, involving automatic tagging of videos with subject theme labels and linking video to material on the same subject in a different language.
Overview of VideoCLEF 2008: Automatic Generation of Topic-based Feeds for Dual Language Audio-Visual Content
The VideoCLEF track, introduced in 2008, aims to develop and evaluate tasks related to analysis of and access to multilingual multimedia content, and will aim to expand the corpus and the class label list, as well as to extend the track to additional tasks.
Blip10000: a social video dataset containing SPUG content for tagging and retrieval
This work presents a dataset that contains comprehensive semi-professional user-generated (SPUG) content, including audiovisual content, user-contributed metadata, automatic speech recognition transcripts, automatic shot boundary files, and social information for multiple 'social levels'.
CLIR for Informal Content in Arabic Forum Posts
Experiments show that dialect classification can help to recognize informal content, thus improving precision, and indicate that neither dialect-tuned morphological analysis nor a lightweight CLIR approach that minimizes propagation of translation errors yet yield a reliable improvement in recall for informal content when compared to a straightforward document translation architecture.
University of Glasgow at WebCLEF 2005: Experiments in per-field Normalisation and Language Specific Stemming
A language specific technique for applying the correct stemming approach, as well as for removing the correct stopwords from the queries, is developed for retrieving relevant documents from a multilingual corpus of Web documents from Web sites of European governments.
Overview of the CLEF-2005 Cross-Language Speech Retrieval Track
The task for the CLEF-2005 cross-language speech retrieval track was to identify topically coherent segments of English interviews in a known-boundary condition, and results indicate that monolingual search technology is sufficiently accurate to be useful for some purposes.
Probabilistic models of information retrieval based on measuring the divergence from randomness
A framework for deriving probabilistic models of Information Retrieval using term-weighting models obtained in the language model approach by measuring the divergence of the actual term distribution from that obtained under a random process is introduced.
On setting the hyper-parameters of term frequency normalization for information retrieval
This study investigates three term frequency normalization methods, namely normalization 2, BM25's normalization and the Dirichlet Priors normalization, and tackles the query dependence problem by modifying the query term weight using a Divergence From Randomness term weighting model and measuring the correlation of the normalized term frequency with the document length.