• Publications
  • Influence
An information-theoretic approach to automatic query expansion
TLDR
We present a computationally simple and theoretically justified method for assigning scores to candidate expansion terms. Expand
  • 369
  • 37
  • PDF
Using Kullback-Leibler Distance for Text Categorization
  • B. Bigi
  • Computer Science
  • ECIR
  • 14 April 2003
TLDR
A system that performs text categorization aims to assign appropriate categories from a predefined classification scheme to incoming documents. Expand
  • 149
  • 15
  • PDF
SPeech Phonetization Alignment and Syllabification (SPPAS): a tool for the automatic analysis of speech prosody
TLDR
SPPAS, SPeech Phonetization Alignment and Syllabification, is a tool to automatically produce annotations which include utterance, word, syllable and phoneme segmentations from a recorded speech sound and its transcription. Expand
  • 72
  • 6
  • PDF
SPPAS - MULTI-LINGUAL APPROACHES TO THE AUTOMATIC ANNOTATION OF SPEECH
TLDR
The first step of most acoustic analyses unavoidably involves the alignment of recorded speech sounds with their phonetic annotation. Expand
  • 48
  • 4
Detecting topic shifts using a cache memory
TLDR
The use of cache memories and symmetric Kullback-Leibler distances is proposed for topic classification and topic-shift detection. Expand
  • 16
  • 3
  • PDF
Which units for acoustic and language modeling for Khmer automatic speech recognition?
TLDR
We investigate how different views of the text data (word and sub-word units) can be exploited for Khmer language modeling. Expand
  • 32
  • 2
  • PDF
Mining a Comparable Text Corpus for a Vietnamese-French Statistical Machine Translation System
TLDR
This paper presents our first attempt at constructing a Vietnamese-French statistical machine translation system, where the use of different units for Vietnamese (syllables, words, or their combinations) is discussed. Expand
  • 20
  • 2
  • PDF
A fuzzy decision strategy for topic identification and dynamic selection of language models
TLDR
This paper introduces a new effective model for topic recognition based on fuzzy relations in which fuzzy variables express degrees of reliability of expert decision. Expand
  • 27
  • 2
  • PDF
SPPAS: a tool for the phonetic segmentation of speech
  • B. Bigi
  • Computer Science
  • LREC
  • 1 May 2012
TLDR
SPPAS is a tool to produce automatic annotations which include utterance, word, syllabic and phonemic segmentations from a recorded speech sound and its transcription. Expand
  • 60
  • 1
  • PDF
A Comparative Study of Topic Identification on Newspaper and E-mail
TLDR
This paper presents several statistical methods for topic identification on two kinds of textual data: newspaper articles and e-mails. Expand
  • 23
  • 1