Learn More
Acquiring labeled speech for low-resource languages is a difficult task in the absence of native speakers of the language. One solution to this problem involves collecting speech transcriptions from crowd workers who are foreign or non-native speakers of a given target language. From these mismatched transcriptions, one can derive probabilistic phone(More)
In this paper, we evaluate a set of linguistic rules for pronunciation variations in Singapore English. We collect and annotate a speech corpus for Singapore English and label it with IPA narrow transcriptions. Data driven pronunciation rules are derived using American English (Buckeye corpus) as a reference. We compare the data driven rules with linguistic(More)
It is extremely challenging to create training labels for building acoustic models of zero-resourced languages, in which conventional resources required for model training – lexicons, transcribed audio, or in extreme cases even orthographic system or a viable phone set design for the language – are unavailable. Here, language mismatched transcripts, in(More)
The Complex Independent Component Analysis (CICA) which extends Independent Component Analysis (ICA) to complex signals has found applications in various fields. The ICA with Reference (ICA-R) has recently gained popularity in semi-blind separation of signals when a priori information of the desired sources are available in the form of reference signals.(More)
  • 1