Corpus ID: 17019366

A Prosodic Corpus of Non-Native Speech

  title={A Prosodic Corpus of Non-Native Speech},
  author={Jan-Torsten Milde and Ulrike Gut},
The paper describes the design and implementation of an XML-based corpus environment for prosodically annotated data. The TASX-environment (TASX: Time Aligned Signal data eXchange format) constitutes the technical basis for a corpus designed to explore the acquisition of prosody by second language learners. It supports all aspects of the corpus setup procedure: XML-based annotation of the speech data, all transformation of non XML-annotations, and the web-based analysis and dissemination of the… Expand

Figures, Tables, and Topics from this paper

AixOx, a multi-layered learners' corpus: automatic annotation
AixOx, with its multi-layered annotation, is a very rich oral data-base for all kinds of studies on L1 productions, L2 productions, language contact, both at the segmental and supra-segmental levels since it offers a phonemic segmentation and alignment and a pro-sodic labelling. Expand
Learner corpora and prosody: From the COREIL corpus to principles on data collection and corpus design
The aim of the present contribution is to present the COREIL corpus, an electronic oral learner corpus that has been designed to study the acquisition of phrasal phonology and intonation in French and English as a foreign language and to explain the principles used to collect and annotate the data. Expand
Framework For Consistent Speech Databases
The introduced speech processing framework creates phonetically and prosodically annotated speech databases. It provides with structured data files in eXtensible Markup Language format. Those filesExpand
An online system for entering and annotating non-native Mandarin Chinese speech for language teaching
The design and implementation of an intuitive online system for the annotation of non-native Mandarin Chinese speech by native Chinese speakers is described, which will allow speech recognition researchers to easily generate a corpus of labeled non- native speech for use in future research. Expand
Spoken English Learner Corpora
A survey of some most significant spoken English learner corpora created up to date, which include various types of English speech generated by learners with Arabic, Chinese, French, German, Greek, Japanese, Korean, Norwegian, Polish, Spanish, among others, as their first language (L1). Expand
The TASX-environment: An XML-based Toolset for the Creation of Multimodal Corpora
  • J. Milde
  • Computer Science
  • 2002
The TASX-environment constitutes a technical basis for all aspects of the corpus setup procedure: XML-based annotation of the multimodal data, transformation of non XML-annotations, and the web-based analysis and dissemination of the data. Expand
A Glossary of Corpus Linguistics
This is the first comprehensive glossary of the many specialist terms in corpus linguistics and will be useful for corpus linguist and non corpus linguists alike. Expand
Effective automatic speech recognition data collection for under-resourced languages
The development of a tool for effectively collecting ASR data for under-resourced languages, called Woefzela, is documents, from the determination of the requirements necessary for effective data collection in this context, to the verification and validation of its functionality. Expand
Querying Annotated Speech Corpora
Two solutions for creating, editing, annotating, storing and querying annotated speech corpora are presented here: the XML-based data format TASX for corpus creation and data format exchange and the NXT search tool for querying corpora. Expand
CBFC: a parallel L2 speech corpus for Korean and French learners
A bilingual corpus of French learners of Korean and Korean learners of French is presented, providing a translated and annotated corpus to the scientific community which can be used for a large array of purposes in the field of theoretical but also applied linguistics. Expand


The TASX-environment : an XML-based corpus database for time aligned language data
The paper describes the design and implementation of an XML-based corpus environment for time aligned language/signal data. The TASX-environment constitutes the technical basis for a phonetic corpusExpand
Correlates of linguistic rhythm in the speech signal
Spoken languages have been classified by linguists according to their rhythmic properties, and psycholinguists have relied on this classification to account for infants' capacity to discriminateExpand
linguistic approach to pitch range modelling
This thesis shows that pitch range can and should be treated as the same entity across various research disciplines extralinguistic, paralinguistic and linguistic rather than the current situation in which pitch range has multiple definitions depending on the particular interest of the respective research discipline. Expand
The Prosody of Nigerian English
Nigerian English is a variety of English which has often been suggested to differ significantly from other varieties of English, especially in the area of prosody. This paper analyses the prosody ofExpand
Perspectives on Stress and Intonation in Language Learning.
Abstract Renewed interest in the teaching of stress and intonation has arisen through the functional approach to language teaching, with its wider use of conversational texts. Stress and intonationExpand
Gesprächstranskription auf dem Computer- das System EXMARaLDA
Der Einsatz des Computers zur Transkription naturlicher Gesprache ist in der Praxis zwar weit verbreitet, die schnelle Weiterentwicklung der Computertechnologie hat aber dazu gefuhrt, dassExpand
PAX - an annotation based concordancing toolkit
  • IRCS Workshop on Linguistic Databases,
  • 2001
A Linguistic Approach to Pitch Range modelling
  • Ph.D. thesis,
  • 2000
Second Language Phonology