• Corpus ID: 18845822

WSJCAM0: A BRITISH ENGLISH SPEECH CORPUS FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

@inproceedings{Robinson1995WSJCAM0AB,
  title={WSJCAM0: A BRITISH ENGLISH SPEECH CORPUS FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION},
  author={RECOGNITIONTony Robinson and Jeroen Fransen and D. Pye and Jonathan Foote},
  year={1995}
}
A signiicant new speech corpus of British English has been recorded at Cambridge University. Derived from the Wall Street Journal text corpus, WSJCAM0 constitutes one of the largest corpora of spoken British English currently in existence. It has been speciically designed for the construction and evaluation of speaker-independent speech recognition systems. The database consists of 140 speakers each speaking about 110 utterances. This paper describes the motivation for the corpus , the… 
The design of a large vocabulary speech corpus for portuguese
TLDR
In the development of this new Portuguese database, the aim was to create a corpus equivalent in size to WSJ0, which was selected from a large engineering school, assuring a large variability of speakers.
Construction of Large Scale Isolated Word Speech Corpus in Bangla
TLDR
A new speech corpus of isolated words in Bangla language has recorded including high frequent words from a text corpus BdNC01, the corpora largest in its type, size and language domain.
Development of a large vocabulary speech database for Cantonese
TLDR
This paper describes work on developing a large vocabulary speech database for Cantonese, which contains a large number of speech utterances which include isolated syllables, polysyllabic words and phonetically rich sentences.
Japanese large-vocabulary continuous-speech recognition using a business-newspaper corpus
TLDR
This result shows that CD phoneme modeling and word trigram language models can be used effectively in Japanese LVCSR.
The voice bank corpus: Design, collection and data analysis of a large regional accent speech database
  • C. Veaux, J. Yamagishi, S. King
  • Computer Science
    2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)
  • 2013
TLDR
The motivation and the processes involved in the design and recording of the Voice Bank corpus, specifically designed for the creation of personalised synthetic voices for individuals with speech disorders, are described.
CorAIt - A Non-native Speech Database for Italian
TLDR
The necessity for this type of database is emphasized, the steps involved in its construction are described, and the features of CorAIt are presented.
DECODER TECHNOLOGY FOR CONNECTIONIST LARGE VOCABULARY SPEECH RECOGNITION
TLDR
An efficient search procedure and its software embodiment in a decoder, NOWAY, which has been incorporated in ABBOT, a hybrid connectionist/ hidden Markov model (HMM) LVCSR system and results indicate that phone deactivation pruning increased the search speed by an order of magnitude while incurring 2% or less relative search error.
Progress in Speech Recognition for Romanian Language
In this chapter we will present the progress made in automatic speech recognition for Romanian language based on the ASRS_RL (Automatic Speech Recognition System for Romanian Language) research
The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments
TLDR
The collection of an audio-visual corpus of read speech from a number of instrumented meeting rooms suitable for use in continuous speech recognition experiments and is captured using a variety of microphones, including arrays, as well as close-up and wider angle cameras.
Acoustic model and language model adaptation for a mobile dictation service
TLDR
In this work, performance of the TKK speech recognition system has been evaluated on law-related speech recorded on a mobile phone with the Mobi-Dic client application and language model adaptation was not able to significantly improve performance.
...
1
2
3
4
5
...

References

SHOWING 1-4 OF 4 REFERENCES
The Design for the Wall Street Journal-based CSR Corpus
TLDR
This paper presents the motivating goals, acoustic data design, text processing steps, lexicons, and testing paradigms incorporated into the multi-faceted WSJ CSR Corpus, a corpus containing significant quantities of both speech data and text data.
Recent improvements to the ABBOT large vocabulary CSR system
TLDR
Substantial performance improvements are presented gained from new approaches to connectionist model combination and phone-duration modeling for large-vocabulary continuous speech recognition in ABBOT.
The use of recurrent networks in continuous speech recognition
  • Advanced Topics in Automatic Speech and Speaker Recognition,
  • 1996
WSJCAM0 corpus and recording description
TLDR
The apparatus includes a supporting means on which is fixedly mounted an optical lens having a predetermined focal length and also an element for either detecting the radiation when the apparatus is used to receive the gyro-stabilized beam or for emitting the radiation as a result of the apparatus being used to transmit a gyrosize beam.