Markpong Jongtaveesataporn

Learn More
Large speech and text corpora are crucial to the development of a state-of-the-art speech recognition system. This paper reports on the construction and evaluation of the first Thai broadcast news speech and text corpora. Specifications and conventions used in the transcription process are described in the paper. The speech corpus contains about 17 hours of(More)
Traditional language models rely on lexical units that are dened as entities separated from each other by word boundary markers. Since there are no such boundaries in Thai, alternative denitions of lexical units have to be pursued. The problem is to nd the optimal set of lexical units that constitutes the vocabulary of the language model and yields the best(More)
  • 1