Call Classification with Hundreds of Classes and Hundred Thousands of Training Utterances ... ... and No Target Domain Data

Abstract

This paper reports about an effort to build a large-scale call router able to reliably distinguish among 250 call reasons. Because training data from the specific application (Target) domain was not available, the statistical classifier was built using more than 300,000 transcribed and annotated utterances from related, but different, domains. Several tuning cycles including three re-annotation rounds, in-lab data recording, bag-of-words-based consistency cleaning, and recognition parameter optimization improved the classifier accuracy from 32% to a performance clearly above 70%.

DOI: 10.1007/978-3-540-69369-7_10

Extracted Key Phrases

4 Figures and Tables

Cite this paper

@inproceedings{SuendermannOeft2008CallCW, title={Call Classification with Hundreds of Classes and Hundred Thousands of Training Utterances ... ... and No Target Domain Data}, author={David Suendermann-Oeft and Phillip Hunter and Roberto Pieraccini}, booktitle={PIT}, year={2008} }