Monolingual and Crosslingual SMS-based FAQ Retrieval

Abstract

This paper presents results for DCU's second participation in the SMS-based FAQ Retrieval task at FIRE. For FIRE 2012, we submitted runs for the monolingual English and Hindi and the crosslingual English to Hindi subtasks. Compared to our experiments for FIRE 2011, our system was simplified by using a single retrieval engine (instead of three) and using a single approach for detection of out of domain queries (instead of three). In our approach, the SMS queries are transformed into a normalized, corrected form and submitted to a retrieval engine to obtain a ranked list of FAQ results. A classifier trained on features extracted from the training data then determines which queries are out of domain and which are not. For our crosslingual English to Hindi experiments, we trained a statistical machine translation system for Hindi to English translation to translate the full Hindi FAQ documents into English. The retrieval then operates on the corrected English input and retrieves results from the translated Hindi FAQ documents. Our best experiments achieved an MRR of 0.949 for the monolingual English subtask, 0.880 for the monolingual Hindi subtask, and 0.450 for the crosslingual subtask.

DOI: 10.1145/2701336.2701634

Extracted Key Phrases

4 Figures and Tables

Cite this paper

@inproceedings{Leveling2013MonolingualAC, title={Monolingual and Crosslingual SMS-based FAQ Retrieval}, author={Johannes Leveling}, booktitle={FIRE}, year={2013} }