• Corpus ID: 44571309

Benchmarking Speech Understanding in Service Robotics

  title={Benchmarking Speech Understanding in Service Robotics},
  author={Andrea Vanzo and Luca Iocchi and Daniele Nardi and Raphael Memmesheimer and Dietrich Paulus and Iryna Ivanovska and Gerhard K. Kraetzschmar},
Speech understanding is a fundamental feature for many applications focused on human-robot interaction. Although many techniques and several services for speech recognition and natural language understanding have been developed in the last years, specific implementation and validation on domestic service robots have not been performed. In this paper, we describe the implementation and the results of a functional benchmark for speech understanding in service robotics that has been developed and… 

Tables from this paper

Tell Your Robot What to Do: Evaluation of Natural Language Models for Robot Command Processing
This work presents a comparative analysis and benchmarking of four natural language understanding models - Mbot, Rasa, LU4R, and ECG to understand domestic service robot commands.


A Robust Speech Recognition System for Service-Robotics Applications
This work proposes and evaluates an architecture for a robust speaker independent speech recognition system using off-the-shelf technology and simple additional methods and reduces false positive recognitions while achieving high accuracy.
Towards Robust Speech Recognition for Human-Robot Interaction
An investigation of comparing different forms of spoken human-robot interaction including a ceiling boundary microphone and microphones of the humanoid robot NAO with a headset is presented and an ASR system using a multipass decoder is described and evaluated.
A Discriminative Approach to Grounded Spoken Language Understanding in Interactive Robotics
A standard linguistic pipeline for semantic parsing is extended toward a form of perceptually informed natural language processing that combines discriminative learning and distributional semantics.
A reranking approach for recognition and classification of speech input in conversational dialogue systems
We address the challenge of interpreting spoken input in a conversational dialogue system with an approach that aims to exploit the close relationship between the tasks of speech recognition and
HuRIC: a Human Robot Interaction Corpus
The Human Robot Interaction Corpus (HuRIC) is made of audio files paired with their transcriptions referring to commands for a robot, e.g. in a home environment, to adopt a simple but expressive representation of commands that can be easily translated into the internal representation of the robot.
A spoken language interface with a mobile robot
A spoken dialogue interface with a mobile robot is described, which a human can direct to specific locations, ask for information about its status, and supply information about the robot's environment.
Handling Complex Commands as Service Robot Task Requests
A novel approach to understand, dialogue, plan, and execute complex sentences to command a mobile service robot by introducing a flexible templatebased algorithm to extract structure from the parse tree of the sentence.
Garbage modeling with decoys for a sequential recognition scenario
This paper suggests an algorithm that augments the grammar of the first recognizer with those valid paths through the language model of the second recognizer that are confusable with the phrases from this grammar.
Learning to interpret natural language navigation instructions from observations
A system that learns to transform natural-language navigation instructions into executable formal plans by using a learned lexicon to refine inferred plans and a supervised learner to induce a semantic parser.
Compilation of Unification Grammars with Compositional Semantics to Speech Recognition Packages
The resulting compiler creates a context-free backbone of the unification grammar, eliminates left-recursive productions and removes redundant grammar rules, and shows no significant computational overhead with respect to speech recognition performances for speech recognition grammar with compositional semantics compared to grammars without.