Learn More
On the Internet, information is largely in text form, which often includes such errors as spelling mistakes. These errors complicate natural language processing because most NLP applications aren't robust and assume that the input data is noise free. Preprocessing is necessary to deal with these errors and meet the growing need for automatic text(More)
This paper presents a hybrid approach to English-Korean name transliteration. The base system is built on MOSES with enabled factored translation features. We expand the base system by combining with various transliteration methods including a Web-based n-best re-ranking, a dictionary-based method, and a rule-based method. Our standard run and best(More)
This paper proposes a word spacing model using a hidden Markov model (HMM) for re ning Korean raw text corpora. Previous statistical approaches for automatic word spacing have used models that make use of inaccurate probabilities because they do not consider the previous spacing state. We consider word spacing problem as a classi cation problem such as(More)
This paper proposes new probabilistic models for analyzing Korean morphology. In order to take advantage of the characteristics of Korean morphology, the proposed models are based on three linguistic units: eojeol (a Korean spacing unit), morpheme, and syllable. Unlike previous approaches that are based on rules and dictionaries, the probabilistic approach(More)
Semantic transformation of a natural language question into its corresponding logical form is crucial for knowledge-based question answering systems. Most previous methods have tried to achieve this goal by using syntax-based grammar formalisms and rule-based logical inference. However, these approaches are usually limited in terms of the coverage of the(More)
Noun extraction is very important for many NLP applications such as information retrieval, automatic text classification, and information extraction. Most of the previous Korean noun extraction systems use a morphological analyzer or a Partof-Speech (POS) tagger. Therefore, they require much of the linguistic knowledge such as morpheme dictionaries and(More)