Compositional Semantic Parsing with Large Language Models

  title={Compositional Semantic Parsing with Large Language Models},
  author={Andrew Drozdov and Nathanael Scharli and Ekin Akyuurek and Nathan Scales and Xinying Song and Xinyun Chen and Olivier Bousquet and Denny Zhou},
Humans can reason compositionally when presented with new tasks. Previous research shows that appropriate prompting techniques enable large language models (LLMs) to solve artificial compositional generalization tasks such as SCAN. In this work, we identify additional challenges in more realistic semantic parsing tasks with larger vocabulary and refine these prompting techniques to address them. Our best method is based on least-to-most prompting: it decomposes the problem using prompting-based… 

Figures and Tables from this paper

ZEROTOP: Zero-Shot Task-Oriented Semantic Parsing using Large Language Models

This work proposes Z ERO TOP, a zero-shot task-oriented parsing method that decomposes a semantic parsing problem into a set of abstractive and extractive question-answering (QA) problems, enabling us to leverage the ability of LLMs to zero- shot answer reading comprehension questions.

Large Language Models are few(1)-shot Table Reasoners

This paper evaluated LLMs on popular table QA and fact verification datasets like WikiTableQuestion, FetaQA, TabFact, and FEVEROUS and found that LLMs are competent at complex reasoning over table structures, though these models are not pre-trained on any table corpus.

Uncontrolled Lexical Exposure Leads to Overestimation of Compositional Generalization in Pretrained Models

It is argued that exposure to pre-training data may break distributional control across training and test to gauge compositional generalization, where certain lexical items only occur in limited contexts during training.

Towards Reasoning in Large Language Models: A Survey

A comprehensive overview of the current state of knowledge on reasoning in large language models, including techniques for improving and eliciting reasoning in these models, methods and benchmarks for evaluating reasoning abilities, and suggestions on future directions are provided.

Can large language models reason about medical questions?

It is speculated that scaling model and data, enhancing prompt alignment and allowing for better contextualization of the completions will be sufficient for LLMs to reach human-level performance on this type of task.

Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes

Language models (LMs) can perform complex reasoning either end-to-end, with hidden latent state, or compositionally, with transparent intermediate state. Composition offers benefits for

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

Under both few-shot and zero-shot settings, PoT can show an average performance gain over CoT by around 12% across all the evaluated datasets, and by combining PoT with self-consistency decoding, can achieve SoT performance on all math problem datasets and near-SoTA performance on financial datasets.

Transcending Scaling Laws with 0.1% Extra Compute

U-PaLM outperforms PaLM on many few-shot setups, i.e., English NLP tasks, reasoning tasks with chain-of-thought, multilingual tasks, MMLU and challenging BIG-Bench tasks, and is able to substantially improve the scaling properties of large language models on downstream metrics.

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them

It is found that applying chain-of-thought (CoT) prompting to BBH tasks enables PaLM to surpass the average human-rater performance on 10 of the 23 tasks, and Codex to surpass it on 17 of the23 tasks.



Compositional Generalization in Semantic Parsing: Pre-training vs. Specialized Architectures

It is shown that masked language model (MLM) pre-training rivals SCAN-inspired architectures on primitive holdout splits and establishes a new state of the art on the CFQ compositional generalization benchmark using MLM pre- training together with an intermediate representation.

Constrained Language Models Yield Few-Shot Semantic Parsers

The results demonstrate that with only a small amount of data and very little code to convert into English-like representations, the blueprint for rapidly bootstrapping semantic parsers leads to surprisingly effective performance on multiple community tasks, greatly exceeding baseline methods also trained on the same limited data.

Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both?

NQG-T5 is proposed, a hybrid model that combines a high-precision grammar-based approach with a pre-trained sequence-to-sequence model that outperforms existing approaches across several compositional generalization challenges on non-synthetic data, while also being competitive with the state of theart on standard evaluations.

Few-Shot Semantic Parsing with Language Models Trained on Code

This paper evaluates OpenAI Codex on Overnight and SMCalFlow and finds that unlike GPT-3, Codex performs similarly when targeting meaning representations directly, perhaps because meaning representations are structured similar to code in these datasets.

Compositional Generalization for Primitive Substitutions

This paper conducts fundamental research for encoding compositionality in neural networks with two representations, one generating attention maps, and the other mapping attended input words to output symbols to improve generalization.

Lexicon Learning for Few Shot Sequence Modeling

This work augments neural decoders with a lexical translation mechanism that generalizes existing copy mechanisms to incorporate learned, decontextualized, token-level translation rules, and shows that it improves systematic generalization on a diverse set of sequence modeling tasks drawn from cognitive science, formal semantics, and machine translation.

Span-based Semantic Parsing for Compositional Generalization

This work proposes SpanBasedSP, a parser that predicts a span tree over an input utterance, explicitly encoding how partial programs compose over spans in the input, which performs similarly to strong seq2seq baselines on random splits, but dramatically improves performance on splits that require compositional generalization.

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Experiments on symbolic manipulation, compositional generalization and numerical reasoning demonstrate that least-to-most prompting can generalize to examples that are harder than those seen in the prompt context, outperforming other prompting-based approaches by a large margin.

Improving Compositional Generalization with Latent Structure and Data Augmentation

This work presents a more powerful data recombination method using a model called Compositional Structure Learner (CSL), a generative model with a quasi-synchronous context-free grammar backbone, which results in a model even stronger than a T5-CSL ensemble on two real world compositional generalization tasks.

Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing

Limits of current techniques for effectively leveraging model scale for compositional generalization in semantic parsing evaluations are highlighted, while the analysis also suggests promising directions for future work.