Learn More
This paper will focus on the semantic representation of verbs in computer systems and its impact on lexical selection problems in machine translation (MT). Two groups of English and Chinese verbs are examined to show that lexical selection must be based on interpretation of the sentence as well as selection restrictions placed on the verb arguments. A novel(More)
We present``Transcriber'', a tool for assisting in the creation of speech corpora, and describe some aspects of its development and use. Transcriber was designed for the manual segmentation and transcription of long duration broadcast news recordings, including annotation of speech turns, topics and acoustic conditions. It is highly portable, relying on the(More)
A common practice in operational Machine Translation (MT) and Natural Language Processing (NLP) systems is to assume that a verb has a fixed number of senses and rely on a precompiled lexicon to achieve large coverage. This paper demonstrates that this assumption is too weak to cope with the similar problems of lexical divergences between languages and(More)
Transcriber is a tool for manual annotation of large speech files. It was originally designed for the broadcast news transcription task. The annotation file format was derived from previous formats used for this task, and many related features were hard-coded. In this paper we present a generalization of the tool based on the annotation graph formalism, and(More)
This paper describes the first version of " Transcriber " , a tool for segmenting, labeling and transcribing speech. It is developed under Unix in the Tcl/Tk script language with extensions in C, and is available as free software. The environment offers the basic functions necessary for segmenting, labeling and transcribing long duration signals. The signal(More)
i ACKNOWLEDGEMENTS I wish to express my deep gratitude to my supervisor Professor Hsu Loke Soo, for his stimulating insights, constant encouragement, and for all the guidance he has given me throughout the entire process of my graduate studies. Many thanks also go to Dr. Martha Palmer and Dr. Tan Chew Lim for being an inspiration throughout my years of(More)
The Linguistic Data Consortium (LDC), an open consortium of universities, companies and government research laboratories, creates, collects and distributes speech and text databases, lexicons, and other resources for research and development purposes. The LDC has published more than 200 CD-ROMs for use by speech recognition engineers, natural language(More)
Annotation graphs (AGs) provide an efficient and expressive data model for linguistic annotations of time-series data [Bird and Liberman, 2001]. Recently, the LDC has been developing a complete software infrastructure supporting the rapid development of tools for transcribing and annotating time-series data, in cooperation with the developers of other(More)
  • 1