Learn More
On the Internet, information is largely in text form, which often includes such errors as spelling mistakes. These errors complicate natural language processing because most NLP applications aren't robust and assume that the input data is noise free. Preprocessing is necessary to deal with these errors and meet the growing need for automatic text(More)
This paper presents a hybrid approach to English-Korean name transliteration. The base system is built on MOSES with enabled factored translation features. We expand the base system by combining with various transliteration methods including a Web-based n-best re-ranking, a dictionary-based method, and a rule-based method. Our standard run and best(More)
Noun extraction is very important for many NLP applications such as information retrieval, automatic text classification, and information extraction. Most of the previous Korean noun extraction systems use a morphological analyzer or a Part-of-Speech (POS) tagger. Therefore, they require much of the linguistic knowledge such as morpheme dictionaries and(More)
This paper proposes new probabilistic models for analyzing Korean morphology. In order to take advantage of the characteristics of Korean morphology, the proposed models are based on three linguistic units: eojeol (a Korean spacing unit), morpheme, and syllable. Unlike previous approaches that are based on rules and dictionaries, the probabilistic approach(More)