Learn More
Many applications in Multilingual and Multi-modal Information Access involve searching large databases of high dimensional data objects with multiple (conditionally independent) views. In this work we consider the problem of learning hash functions for similarity search across the views for such applications. We propose a principled method for learning a(More)
It is well known that the use of a good Machine Transliteration system improves the retrieval performance of Cross-Language Information Retrieval (CLIR) systems when the query and document languages have different orthography and phonetic alphabets. However, the effectiveness of a Machine Transliteration system in CLIR is limited by its ability to produce(More)
In this paper, we address the problem of mining transliterations of Named Entities (NEs) from large comparable corpora. We leverage the empirical fact that multilingual news articles with similar news content are rich in Named Entity Transliteration Equivalents (NETEs). Our mining algorithm, MINT, uses a cross-language document similarity model to align(More)
In this paper we study a set of problems that are of considerable importance to Statistical Machine Translation (SMT) but which have not been addressed satisfactorily by the SMT research community. Over the last decade, a variety of SMT algorithms have been built and empirically tested whereas little is known about the computational complexity of some of(More)
Although Wikipedia has emerged as a powerful collaborative Encyclopedia on the Web, it is only partially multilingual as most of the content is in English and a small number of other languages. In real-life scenarios, non-English users in general and ESL/EFL 1 users in particular, have a need to search for relevant English Wikipedia articles as no relevant(More)
Pseudo-Relevance Feedback (PRF) assumes that the top-ranking n documents of the initial retrieval are relevant and extracts expansion terms from them. In this work, we introduce the notion of pseudo-irrelevant documents, i.e. high-scoring documents outside of top n that are highly unlikely to be relevant. We show how pseudo-irrelevant documents can be used(More)
While users often revisit pages on the Web, tool support for such re-visitation is still lacking. Current tools (such as browser histories) only provide users with basic information such as the date of the last visit and title of the page visited. In this paper, we describe a system that provides users with descriptive topic-phrases that aid re-finding.(More)