A Corpus-Based Analysis of Mixed Code in Hong Kong Speech

  title={A Corpus-Based Analysis of Mixed Code in Hong Kong Speech},
  author={John Sie Yuen Lee},
  journal={2012 International Conference on Asian Language Processing},
  • J. Lee
  • Published 13 November 2012
  • Computer Science
  • 2012 International Conference on Asian Language Processing
We present a corpus-based analysis of the use of mixed code in Hong Kong speech. From transcriptions of Cantonese television programs, we identify English words embedded within Cantonese utterances, and investigate the motivations for such code-switching. Among the many motivations observed in previous research, we found that four alone account for more than 95% of the use of English words in our speech data across genres, genders, and age groups. We performed analyses over more than 60 hours… 

Tables and Topics from this paper

Corpus-based learning of Cantonese for Mandarin speakers
The first study on using a parallel corpus to teach Cantonese, the variety of Chinese spoken in Hong Kong, with Mandarin-speaking undergraduate students at the beginner level suggests the potential of applying parallel corpora at even the beginners’ level for other L1-L2 pairs of closely related languages.
LOTUS-BI: A Thai-English Code-mixing Speech Corpus
The design and construction of LOTUS-BI corpus: a Thai-English code-mixing speech corpus aimed to be the essential speech database for training acoustic model and language model in order to obtain the better speech recognition accuracy is described.
“I Want to be More Hong Kong Than a Hongkonger”
  The years leading up to the political handover of Hong Kong to Mainland China surfaced issues regarding national identification and intergroup relations. These issues manifested in Hong Kong films


Development of a cantonese-english code-mixing speech recognition system
It is shown that the proposed data-driven approach based on K-L divergence and phonetic confusion matrix outperforms the IPA-based approach using merely phonetic knowledge and the language model perplexity and recognition performance has been significantly improved with the proposed semantics-based language models.
Cantonese‐English code‐switching research in Hong Kong: a Y2K review
This paper is a review of the major works in code-switching in Hong Kong to date. Four context-specific motivations commonly found in the Hong Kong Chinese press - euphemism, specificity, bilingual
The Sociolinguistic Significance of Conversational Code-Switching
By conversational code-switching, I refer to the juxtaposition of passages of speech belonging to two different grammatical systems or subsystems, within the same exchange. Most frequently the
Language Contact and Bilingualism
This book draws together this diverse research, looking at examples from many different situations, to present the topic in any easily accessible form, offering a much needed overview of this lively area of language study.
Linguistic Convergence: Impact of English on Hong Kong Cantonese
This article presents two types of evidence obtained from the Hong Kong Chinese press before and after the handover – lexicosyntactic transference of English words and specific functions assigned t...
The Social Distinctiveness of Two Code-mixing Styles in Hong Kong
One of the major foci of sociolinguistics is the study of language practices and their social meanings. This can be seen from classic study of language and identities in Martha’s Vineyard (Labov
How does Cantonese-English code-mixing work?”, in Language in Hong Kong at Century’s End
  • 1998
Muysken, Language contact and bilingualism
  • London: Arnold,
  • 1987
Code - mixing and koineizing in the speech of students at the university of Hong Kong ”
  • Anthropological Linguistics
  • 1979