Matthias Sperber

Learn More
BACKGROUND The study of microbial diversity and community structures heavily relies on the analyses of sequence data, predominantly taxonomic marker genes like the small subunit of the ribosomal RNA (SSU rRNA) amplified from environmental samples. Until recently, the "gold standard" for this strategy was the cloning and Sanger sequencing of amplified target(More)
This paper describes our English Speech-to-Text (STT) systems for the 2012 IWSLT TED ASR track evaluation. The systems consist of 10 subsystems that are combinations of different front-ends, e.g. MVDR based and MFCC based ones, and two different phone sets. The outputs of the subsystems are combined via confusion network combination. Decoding is done in two(More)
In this paper, we study the problem of manually correcting automatic annotations of natural language in as efficient a manner as possible. We introduce a method for automatically segmenting a corpus into chunks such that many uncertain labels are grouped into the same chunk, while human supervision can be omitted altogether for other segments. A tradeoff(More)
We propose a method for efficient off-line speech transcription through respeaking. Speech is segmented into smaller utterances using an initial automatic transcript. Respeaking is performed segment by segment, while confidence filtering helps save supervision effort. We conduct detailed experiments comparing speaking vs. typing, sequential vs.(More)
This paper describes the KIT-NAIST (Contrastive) English speech recognition system for the IWSLT 2012 Evaluation Campaign. In particular, we participated in the ASR track of the IWSLT TED task. The system was developed by Karl-sruhe Institute of Technology (KIT) and Nara Institute of Science and Technology (NAIST) teams in collaboration within the interACT(More)
In this paper, we present the KIT translation systems as well as the KIT-LIMSI systems for the ACL 2016 First Conference on Machine Translation. We participated in the shared task Machine Translation of News and submitted translation systems for three different directions: English→German, German→English and English→Romanian. We used a phrase-based machine(More)
Foreign students at German universities often have difficulties following lectures as they are often held in German. Since human interpreters are too expensive for universities we are addressing this problem via speech translation technology deployed in KIT's lecture halls. Our simultaneous lecture translation system automatically translates lectures from(More)
We propose an on-the-fly updating framework for cost-sensitive manual correction of automatically recognized speech transcripts. This framework trains cost-models during the transcription process, and does not require the transcriber enrollment necessary in previous work. We use a baseline method that optimizes a segmentation into segments to supervise or(More)
Evaluating the quality of output from language processing systems such as machine translation or speech recognition is an essential step in ensuring that they are sufficient for practical use. However, depending on the practical requirements, evaluation approaches can differ strongly. Often, reference-based evaluation measures (such as BLEU or WER) are(More)