Thai Part-of-speech Tagged Corpus: ORCHID

This paper presents a procedure in building a Thai partof-speech (POS) tagged corpus, called ORCHID corpus. It is a collaboration project between Communications Research Laboratory (CRL) of Japan and National Electronics and Computer Technology Center (NECTEC) of Thailand, supported by Electrotechnical Laboratory (ETL) of Japan. We propose a new tagset based on the previous research on Thai parts-of-speech for using in a multi-lingual machine translation project. We mark the corpus in three… 

Building a Thai part-of-speech tagged corpus (ORCHID)

A new tagset is proposed, based on the results of a prior multilingual machine translation project, for a Thai part-of-speech (POS) tagged corpus, which is a preliminary stage in the construction of a Thai speech corpus.

