• Publications
  • Influence
Enriching speech recognition with automatic detection of sentence boundaries and disfluencies
TLDR
A metadata detection system that combines information from different types of textual knowledge sources with information from a prosodic classifier is described, and it is found that discriminative models generally outperform generative models. Expand
Automatic dialog act segmentation and classification in multiparty meetings
TLDR
It is found that a very simple prosodic model aids performance over lexical information alone, especially for segmentation, in the two related tasks of dialog act segmentation and DA classification for speech from the ICSI Meeting Corpus. Expand
Unsupervised Approaches for Automatic Keyword Extraction Using Meeting Transcripts
TLDR
The results have shown that the simple unsupervised TFIDF approach performs reasonably well, and the additional information from POS and sentence score helps keyword extraction, however, the graph method is less effective for this domain. Expand
Insertion, Deletion, or Substitution? Normalizing Text Messages without Pre-categorization nor Supervision
TLDR
This paper proposes a unified letter transformation approach that requires neither pre-categorization nor human supervision and significantly outperformed the state-of-the-art deletion-based abbreviation system and the jazzy spell checker. Expand
Panmicrobial Oligonucleotide Array for Diagnosis of Infectious Diseases
TLDR
GreeneChipPm, a panmicrobial microarray comprising 29,455 sixty-mer oligonucleotide probes for vertebrate viruses, bacteria, fungi, and parasites, confirmed the presence of viruses and bacteria identified by other methods, and implicated Plasmodium falciparum in an unexplained fatal case of hemorrhagic feverlike disease during the Marburg hemorrhagic Fever outbreak in Angola in 2004–2005. Expand
Angiosperm phylogeny inferred from sequences of four mitochondrial genes
TLDR
A mitochondrial gene‐based angiosperm phylogeny is reconstructed in a maximum likelihood analysis of sequences of four mitochondrial genes, atp1, matR, nad5, and rps3 from 380 species that represent 376 genera and 296 families of seed plants to reconstruct the underlying organismal phylogeny. Expand
Part-of-Speech Tagging for English-Spanish Code-Switched Text
TLDR
Results on Part-of-Speech tagging Spanish-English code-switched discourse are presented and different approaches to exploit existing resources for both languages are explored that range from simple heuristics, to language identification, to machine learning. Expand
A study in machine learning from imbalanced data for sentence boundary detection in speech
TLDR
A hidden Markov model (HMM) system to detect sentence boundaries that uses both prosodic and textual information is constructed and Bagging was found to significantly improve system performance for each of the sampling methods. Expand
Learning to Predict Code-Switching Points
TLDR
Exploratory results on learning to predict potential code-switching points in Spanish-English are presented, using a transcription of code- Switched discourse to evaluate the performance of the classifiers. Expand
Using Supervised Bigram-based ILP for Extractive Summarization
TLDR
A bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework that consistently outperforms the previous ILP method on different TAC data sets, and performs competitively compared to the best results in the TAC evaluations. Expand
...
1
2
3
4
5
...