Zero-Shot Cross-lingual Semantic Parsing

  title={Zero-Shot Cross-lingual Semantic Parsing},
  author={Tom Sherborne and Mirella Lapata},
Recent work in cross-lingual semantic parsing has successfully applied machine translation to localize parsers to new languages. However, these advances assume access to high-quality machine translation systems and word alignment tools. We remove these assumptions and study cross-lingual semantic parsing as a zero-shot problem, without parallel data (i.e., utterance-logical form pairs) for new languages. We propose a multi-task encoder-decoder model to transfer parsing knowledge to additional… 

Tables from this paper

Translate, then Parse! A Strong Baseline for Cross-Lingual AMR Parsing

This paper revisits this simple two-step base-line, and enhances it with a strong NMT system and a strong AMR parser, showing that T+P outperforms a recent state-of-the-art system across all tested languages.

Multilingual Compositional Wikidata Questions

This work proposes a method for creating a multilingual, parallel dataset of question-query pairs, grounded in Wikidata, and introduces such a dataset called CompositionalWikidata Questions (CWQ), and utilizes this data to train and evaluate semantic parsers for Hebrew, Kannada, Chinese and English, to better understand the current strengths and weaknesses of multilingual semantic parsing.

Meta-Learning a Cross-lingual Manifold for Semantic Parsing

A first-order meta-learning algorithm is introduced to train a semantic parser with maximal sample efficiency during cross-lingual transfer and yields accurate semantic parsers sampling ≤10% of source training data in each new language.

Extrinsic Evaluation of Machine Translation Metrics

This research investigates how useful MT metrics are at detecting the success of a machine translation component when placed in a larger platform with a downstream task and suggests that future MT metrics be designed to produce error labels rather than scores to facilitate extrinsic evaluation.

XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing

This work introduces the XRICL framework, which learns to retrieve relevant English exemplars for a given query to construct prompts and effectively leverages large pre-trained language models to outperform existing baselines.

DAMP: Doubly Aligned Multilingual Parser for Task-Oriented Dialogue

This work shows that pretraining alignment objectives improve mul- 012 tilingual transfer while also reducing negative negative transfer to English, and introduces a strained optimization method to improve align- 015 ment using domain adversarial training.

Zero-shot Cross-lingual Conversational Semantic Role Labeling

The usefulness of CSRL to non-Chinese conversational tasks such as the question-in-context rewriting task in English and the multi-turn dialogue response generation tasks in English, German and Japanese is improved by incorporating the CSRL information into the downstream conversation-based models.

MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages

A multilingual dataset, MCoNaLa, is proposed to benchmark code generation from natural language commands extending beyond English, and a quantitative evaluation of performance on the M coNaLa dataset is presented by testing with state-of-theart code generation systems.

Compositional Generalization in Multilingual Semantic Parsing over Wikidata

A method is proposed for creating a multilingual, parallel dataset of question-query pairs, grounded in Wikidata, and it is used to analyze the compositional generalization of semantic parsers in Hebrew, Kannada, Chinese, and English.



Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

An architecture to learn joint multilingual sentence representations for 93 languages, belonging to more than 30 different families and written in 28 different scripts using a single BiLSTM encoder with a shared byte-pair encoding vocabulary for all languages, coupled with an auxiliary decoder and trained on publicly available parallel corpora.

Multilingual Semantic Parsing And Code-Switching

This paper describes a transfer learning method using crosslingual word embeddings in a sequence-to-sequence model that achieves state-of-the-art accuracy on the NLmaps corpus and observes a consistent improvement for German compared with several baseline domain adaptation techniques.

Treebank Translation for Cross-Lingual Parser Induction

This approach draws on annotation projection but avoids the use of noisy source-side annotation of an unrelated parallel corpus and instead relies on manual treebank annotation in combination with statistical machine translation, which makes it possible to train fully lexicalized parsers.

Bootstrapping a Crosslingual Semantic Parser

Experimental results indicate that MT can approximate training data in a new language for accurate parsing when augmented with paraphrasing through multiple MT engines, and considering when MT is inadequate, it is found that using this approach achieves parsing accuracy within 2% of complete translation using only 50% of training data.

Multilingual Code-Switching for Zero-Shot Cross-Lingual Intent Prediction and Slot Filling

This work proposes a novel method which augments monolingual source data using multilingual code-switching via random translations, to enhance generalizability of large multilingual language models when fine-tuning them for downstream tasks.

Explicit Alignment Objectives for Multilingual Bidirectional Encoders

A new method for learning multilingual encoders, AMBER (Aligned Multilingual Bidirectional EncodeR), trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities is presented.

End-to-End Slot Alignment and Recognition for Cross-Lingual NLU

This work proposes a novel end-to-end model that learns to align and predict slots in a multilingual NLU system and uses the corpus to explore various cross-lingual transfer methods focusing on the zero-shot setting and leveraging MT for language expansion.

MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark

A new multilingual dataset, called MTOP, comprising of 100k annotated utterances in 6 languages across 11 domains is presented, and strong zero-shot performance using pre-trained models combined with automatic translation and alignment, and a proposed distant supervision method to reduce the noise in slot label projection are demonstrated.

Don’t Parse, Insert: Multilingual Semantic Parsing with Insertion Based Decoding

A non-autoregressive parser which is based on the insertion transformer to overcome these two issues, which speeds up decoding by 3x while outperforming the autoregressive model and significantly improves cross-lingual transfer in the low-resource setting by 37% compared to autore progressive baseline.

Unsupervised Machine Translation Using Monolingual Corpora Only

This work proposes a model that takes sentences from monolingual corpora in two different languages and maps them into the same latent space and effectively learns to translate without using any labeled data.