Corpus-Driven Splitting of Compound Words


A method is presented for splitting compound words into their constituents based on cognate words in the other language of a parallel corpus. A minor extension to the method using a bilingual lexicon (which may be statistically derived from the corpus) allows the decompounding of words that do not have cognates in the other language. Further, the algorithm… (More)


7 Figures and Tables