Mining Parallel Documents Using Low Bandwidth and High Precision CLIR from the Heterogeneous Web

Abstract

We propose a content-based approach to mine parallel resources from the entire web using cross lingual information retrieval (CLIR) with search query relevance score (SQRS). Our method improves mining recall by going beyond URL matching to find parallel documents from non-parallel sites. We introduce SQRS to improve the precision of mining. Our method makes… (More)

Topics

10 Figures and Tables

Slides referencing similar topics