Word embedding, specially with its recent developments, promises a quantification of the similarity between terms. However, it is not clear to which extent this similarity value can be genuinely meaningful and useful for subsequent tasks. We explore how the similarity score obtained from the models is really indicative of term relatedness. We first observe… (More)
This document describes the participation of Vienna University of Technology in the TREC Clinical Decision Support Track 2014. Four different search models are investigated, as well as different strategies to index the corpus and to extract the most relevant information from the topics. Our results conclude that BM25 and Vector Space Model had similar… (More)
This paper describes the efforts of Vienna University of Technology (TUW) in the MediaEval 2014 Retrieving Diverse Social Images challenge. Our approach consisted of 3 steps: (1) a pre-filtering based on Machine Learning, (2) a re-ranking based on Word2Vec, and (3) a clustering part based on an ensemble of clusters. Our best run reached a F@20 of 0.564.
— We revisit text-based image retrieval for social media, exploring the opportunities offered by statistical semantics. We assess the performance and limitation of several complementary corpus-based semantic text similarity methods in combination with word representations. We compare results with state-of-the-art text search engines. Our deep learning-based… (More)
A recurring question in information retrieval is whether term associations can be properly integrated in traditional information retrieval models while preserving their robustness and effectiveness. In this paper, we revisit a wide spectrum of existing models (Pivoted Document Normalization, BM25, BM25 Verboseness Aware, Multi-Aspect TF, and Language… (More)
This paper describes the contributions of Vienna University of Technology (TUW) to the MediaEval 2015 Retrieving Diverse Social Images challenge. Our approach consists of 3 phases: (1) Precision-oriented-phase: in which we focus only on the relevance of the documents; (2) Recall-oriented-phase: in which we focus only on the diversity aspect; (3) Merging… (More)
Recent developments on word embedding provide a novel source of information for term-to-term similarity. A recurring question now is whether the provided term associations can be properly integrated in the traditional information retrieval models while preserving their robustness and effectiveness. In this paper, we propose addressing the question of… (More)