Krit Kosawat

Learn More
This work is an attempt to construct a perceptual representation of Thai consonants based on perceptual identification results (from 28 Thais) of 21 phonemes presented in noise. The experiment is designed to equally make pairwise comparisons among 21 word-initial phonemes, which results in 210 real-word stimulus pairs. Percent correct responses and(More)
We methodically design and develop a subjective intelligibility testing of Thai speech based on the diagnostic rhyme test (DRT). The Thai DRT (TDRT) consists of 2 test sets, one for initials and the other final consonants. The test for initials is designed to equally compare 21 phonemes pairwise, which results in 210 stimulus pairs. The TDRT for finals(More)
This is a non-technical paper describing how and why we organized BEST 2009, the first contest in the series of “Benchmark for Enhancing the Standard of Thai language processing”, which is expected to help accelerate the progress of the Natural Language Processing technology in Thailand by assembling 3 essential components: common standards,(More)
This document describes the development process of the BEST 2009 word segmented-corpus. It is the first corpus to benchmark Thai word segmentation software. The corpus is composed of four genres, namely, collection of news, novels, encyclopedia, and academic articles. It contains 509 files. Its length is 64.1 MB. There are 5,036,229 tokens with 83,027(More)
This paper presents an online Thai-English MT system, called PARSIT, which is an extension of PARSIT English-Thai one. We aim to assist foreigners and Thai in exchanging more easily their information. The system is a rulebased and Interlingua approach. To improve the system, we concentrate on pre-processing and rule analysis phases, which are considered(More)
This work provides detailed frequency and distribution of Thai phonemes, biphones, and syllable types drawn from three large-scale Thai corpora (InterBEST, LOTUS-BN, and LOTUS-Cell 2.0). Comparisons are carried out to examine an extent to which linguistic variation, associated with different corpus types (written vs. spoken), affects frequency statistics(More)
This paper presents steps in accessing Thai phoneme distribution from large-scale written Thai corpora. The data were from 12 text genres from InterBEST [1], considered the biggest Thai corpora. Each word was transliterated using the grapheme-to-phoneme software [2]. Then, frequency of words, frequency of 81 Thai phonemes in each genre, and the 95% CIs of(More)
Since Thai writing system has no explicit word and sentence boundaries, language sense in Thai depends on how we segment them. Disambiguation by grammars cannot handle all problems because many exceptions occur in the language. Machine learning technique is then introduced to cope with the ambiguity problems. This technique, however, needs good corpora to(More)