Hybrid approach for aligning parallel sentences for languages without a written form using standard Malay and Malay dialects
Chinese-English parallel corpora are key resources for Chinese-English cross-language information processing, Chinese-English bilingual lexicography, Chinese-English language research and teaching. But so far large-scale Chinese-English corpus is still unavailable yet, given the difficulties and the intensive labours required. In this paper, our work towards building a large-scale Chinese-English parallel corpus is presented. We elaborate on the collection, annotation and mark-up of the parallel Chinese-English texts and the workflow that we used to construct the corpus. In addition, we also present our work toward building tools for constructing and using the corpus easily for different purposes. Among these tools, a parallel concordance tool developed by us is examined in detail. Several applications of the corpus being conducted are also introduced briefly in the paper.