Mining Parallel Documents Using Low Bandwidth and High Precision CLIR from the Heterogeneous Web


We propose a content-based approach to mine parallel resources from the entire web using cross lingual information retrieval (CLIR) with search query relevance score (SQRS). Our method improves mining recall by going beyond URL matching to find parallel documents from non-parallel sites. We introduce SQRS to improve the precision of mining. Our method makes… (More)


10 Figures and Tables

