Translating Natural Language Instructions for Behavioral Robot Navigation with a Multi-Head Attention Mechanism

  title={Translating Natural Language Instructions for Behavioral Robot Navigation with a Multi-Head Attention Mechanism},
  author={Patricio Cerda-Mardini and Vladimir Araujo and Alvaro Soto},
We propose a multi-head attention mechanism as a blending layer in a neural network model that translates natural language to a high level behavioral language for indoor robot navigation. We follow the framework established by (Zang et al., 2018a) that proposes the use of a navigation graph as a knowledge base for the task. Our results show significant performance gains when translating instructions on previously unseen environments, therefore, improving the generalization capabilities of the… Expand
1 Citations
BDCN: Semantic Embedding Self-explanatory Breast Diagnostic Capsules Network
  • Dehua Chen, Keting Zhong, Jianrong He
  • Computer Science
  • CCL
  • 2021
This model is the first to combine the capsule network with semantic embedding for the AI diagnosis of breast tumors, using capsules to simulate semantics, and improves the model performance and has good interpretability, which is more suitable for clinical situations. Expand


Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation
We propose an end-to-end deep learning model for translating free-form natural language instructions to a high-level plan for behavioral robot navigation. We use attention models to connectExpand
Behavioral Indoor Navigation With Natural Language Directions
This work describes a behavioral navigation approach that leverages the rich semantic structure of human environments to enable robots to navigate without an explicit geometric representation of the world and presents efforts to allow robots to follow navigation instructions in natural language. Expand
A Deep Learning Based Behavioral Approach to Indoor Autonomous Navigation
The results show that using a simple sets of perceptual and navigational behaviors, the proposed approach can successfully guide the way of the robot as it completes navigational missions such as going to a specific office. Expand
Attention is All you Need
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. Expand
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
The LXMERT (Learning Cross-Modality Encoder Representations from Transformers) framework, a large-scale Transformer model that consists of three encoders, achieves the state-of-the-art results on two visual question answering datasets and shows the generalizability of the pre-trained cross-modality model. Expand
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
These advanced recurrent units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU), are found to be comparable to LSTM. Expand
GloVe: Global Vectors for Word Representation
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure. Expand
Recommending Themes for Ad Creative Design via Visual-Linguistic Representations
A theme (keyphrase) recommender system for ad creative strategists to automatically infer ad themes via such multimodal sources of information in past ad campaigns is proposed, and it is shown that cross-modal representations lead to significantly better classification accuracy and ranking precision-recall metrics. Expand