• Publications
  • Influence
What do you learn from context? Probing for sentence structure in contextualized word representations
TLDR
We probe word-level contextual representations from four recent models and investigate how they encode sentence structure across a range of syntactic, semantic, local, and long-range phenomena. Expand
  • 233
  • 19
  • PDF
Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval
TLDR
We have developed a technique for automatic transliteration of named entities for English-Chinese cross-language spoken document retrieval (CL-SDR). Expand
  • 91
  • 8
  • PDF
Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling
TLDR
We conduct the first large-scale systematic study of candidate pretraining tasks, comparing 19 different tasks both as alternatives and complements to language modeling. Expand
  • 38
  • 5
  • PDF
MATBN: A Mandarin Chinese Broadcast News Corpus
TLDR
The MATBN Mandarin Chinese broadcast news corpus contains 198 hours of broadcast news from the Public Television Service Foundation (Taiwan) with corresponding transcripts. Expand
  • 122
  • 4
  • PDF
Word Topic Models for Spoken Document Retrieval and Transcription
  • B. Chen
  • Computer Science
  • TALIP
  • 1 March 2009
TLDR
We propose a word topic model (WTM) to explore the co-occurrence relationship between words, as well as the long-span latent topical information, for language modeling in spoken document retrieval and transcription. Expand
  • 42
  • 4
  • PDF
Latent topic modeling of word vicinity information for speech recognition
TLDR
A new topic language model, named word vicinity model (WVM), is proposed to explore the co-occurrence relationship between words, as well as the long-span latent topical information for language model adaptation. Expand
  • 17
  • 4
  • PDF
A discriminative HMM/N-gram-based retrieval approach for mandarin spoken documents
TLDR
In recent years, statistical modeling approaches have steadily gained in popularity in the field of information retrieval. Expand
  • 32
  • 3
  • PDF
Improved spoken document retrieval by exploring extra acoustic and linguistic cues
TLDR
In this paper, we explored the use of various extra information to improve the performance of spoken document retrieval (SDR). Expand
  • 25
  • 3
  • PDF
Extractive spoken document summarization for information retrieval
  • B. Chen, Y. Chen
  • Computer Science, Mathematics
  • Pattern Recognit. Lett.
  • 1 March 2008
TLDR
The purpose of extractive summarization is to automatically select a number of indicative sentences, passages, or paragraphs from the original document and then sequence them to form a concise summary. Expand
  • 14
  • 3
  • PDF
Mandarin-English Information (MEI): investigating translingual speech retrieval
TLDR
This paper describes the Mandarin–English Information (MEI) project, where we investigated the problem of cross-language spoken document retrieval (CL-SDR), and developed one of the first English–Chinese CL- SDR systems. Expand
  • 64
  • 2
  • PDF