Language Models of Code are Few-Shot Commonsense Learners

  title={Language Models of Code are Few-Shot Commonsense Learners},
  author={Aman Madaan and Shuyan Zhou and Uri Alon and Yiming Yang and Graham Neubig},
  booktitle={Conference on Empirical Methods in Natural Language Processing},
We address the general task of structured commonsense reasoning: given a natural language input, the goal is to generate a graph such as an event or a reasoning-graph.To employ large language models (LMs) for this task, existing approaches ‘serialize’ the output graph as a flat list of nodes and edges.Although feasible, these serialized graphs strongly deviate from the natural language corpora that LMs were pre-trained on, hindering LMs from generating them correctly. In this paper, we show… 

Code4Struct: Code Generation for Few-Shot Structured Prediction from Natural Language

This work proposes C ODE 4S TRUCT to leverage text-to-structure translation capability to tackle structured prediction tasks in NLP and exploits the analogy between PL and NLP problems, and uses it to tackle the EAE task using code generation.

When Neural Model Meets NL2Code: A Survey

This survey focuses on how does neural network (NN) solves NL2Code and proposes a comprehensive framework, which is able to cover all studies in this task, and in-depth parse the existing studies into this framework.

Explanation Selection Using Unlabeled Data for In-Context Learning

Across four textual reasoning tasks spanning question answering, mathematical reasoning, and natural language inference, results show that the proxy metrics correlate with ground truth accuracy and the overall method can effectively improve prompts over crowdworker annotations and naive search strategies.

CORRPUS: Detecting Story Inconsistencies via Codex-Bootstrapped Neurosymbolic Reasoning

It is shown that the CoRRPUS system and abstracted prompting procedures can beat current state-of-the-art structured LLM techniques on pre-existing story understanding tasks (bAbI task 2 and Re 3 ) with minimal hand engineering.

Generating Natural Language Proofs with Verifier-Guided Search

A novel stepwise method, NLProofS (Natural Language Proof Search), which learns to generate relevant steps conditioning on the hypothesis, which improves the correctness of predicted proofs from 27.7% to 33.3% in the distractor setting of EntailmentBank, demonstrating the effectiveness of NL proofS in generating challenging human-authored proofs.

Causal Reasoning of Entities and Events in Procedural Texts

This work proposes CREPE, the first benchmark on causal reasoning of event plausibility and entity states, and boosts model performance to .59 F1 by creatively representing events as programming languages while prompting language models pretrained on code.

Large Language Models are reasoners with Self-Verification

A new method called self-verification that uses the conclusion of the CoT as a condition to build a new sample and asks the LLM to re-predict the original conditions which be masked, which can improve the accuracy of multiple arithmetics and logical reasoning datasets when using few-shot learning.

Complex QA and language models hybrid architectures, Survey

This paper identifies key elements augmenting LLM to solve complex questions or problems, using elements such as: hybrid LLM architectures, active human reinforcement learning supervised with AI, prompting adaptation, neuro-symbolic and structured knowledge grounding, program synthesis, iterated decomposition and others.

Complementary Explanations for Effective In-Context Learning

This work proposes a maximal-marginal-relevance-based exemplar selection approach for constructing exemplar sets that are both relevant as well as complementary, which successfully improves the in-context learning performance across three real-world tasks on multiple LLMs.

Multimodal Subtask Graph Generation from Instructional Videos

This work presents Multimodal Sub task Graph Generation (MSG2), an approach that constructs a Subtask Graph defining the dependency between a task's subtasks relevant to a task from noisy web videos, closer to human-annotated graphs compared to prior approaches.



Neural Language Modeling for Contextualized Temporal Graph Generation

This paper uses existing IE/NLP tools to automatically generate a large quantity of system-produced document-graph pairs, and proposes a novel formulation of the contextualized graph generation problem as a sequence-to-sequence mapping task that outperforms the closest existing method by a large margin.

Synchromesh: Reliable code generation from pre-trained language models

A framework for substantially improving the reliability of pre-trained models for code generation and observing substantial complementary gains from CSD and TST in prediction accuracy and in effectively preventing run-time errors is proposed.

A systematic evaluation of large language models of code

This work finds that existing opensource models do achieve close results in some programming languages, although targeted mainly for natural language modeling, and identifies an important missing piece in the form of a large open-source model trained exclusively on a multi-lingual corpus of code.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Language Models are Few-Shot Learners

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

PaLM: Scaling Language Modeling with Pathways

A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.

InCoder: A Generative Model for Code Infilling and Synthesis

INCODER is introduced, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling) and the ability to condition on bidirectional context substantially improves performance on challenging tasks such as type inference, comment generation, and variable re-naming.

proScript: Partially Ordered Scripts Generation

This work demonstrates for the first time that pre-trained neural language models can be finetuned to generate high-quality scripts, at varying levels of granularity, for a wide range of everyday scenarios (e.g., bake a cake).

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

Comprehensive experiments show that CodeT5 significantly outperforms prior methods on understanding tasks such as code defect detection and clone detection, and generation tasks across various directions including PL-NL, NL-PL, and PL-PL.

A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories

A new framework for evaluating story understanding and script learning: the 'Story Cloze Test', which requires a system to choose the correct ending to a four-sentence story, and a new corpus of ~50k five- Sentence commonsense stories, ROCStories, to enable this evaluation.