• Corpus ID: 245329531

WebGPT: Browser-assisted question-answering with human feedback

@article{Nakano2021WebGPTBQ,
  title={WebGPT: Browser-assisted question-answering with human feedback},
  author={Reiichiro Nakano and Jacob Hilton and Suchir Balaji and Jeff Wu and Ouyang Long and Christina Kim and Christopher Hesse and Shantanu Jain and Vineet Kosaraju and William Saunders and Xu Jiang and Karl Cobbe and Tyna Eloundou and Gretchen Krueger and Kevin Button and Matthew Knight and Benjamin Chess and John Schulman},
  journal={ArXiv},
  year={2021},
  volume={abs/2112.09332},
  url={https://api.semanticscholar.org/CorpusID:245329531}
}
GPT-3 is fine-tune to answer long-form questions using a text-based web-browsing environment, which allows the model to search and navigate the web, and the best model is obtained, which is preferred by humans 56% of the time to those of the authors' human demonstrators, and 69%" to the highest-voted answer from Reddit.

From natural language to simulations: applying AI to automate simulation modelling of logistics systems

It is demonstrated that a framework constructed upon the refined GPT-3 Codex is capable of generating functionally valid simulations for queuing and inventory management systems when provided with a verbal explanation.

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

This work shows that personalized alignment can be achieved by decomposing preferences into multiple dimensions based on personalizations that are declared as desirable by the user and can be efficiently trained independently in a distributed manner and combined effectively post-hoc through parameter merging.

Tool Learning with Foundation Models

A systematic investigation of tool learning is presented, including its cognitive origins, the paradigm shift of foundation models, and the complementary roles of tools and models, and a general tool learning framework is formulated.

Augmented Language Models: a Survey

The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks and even outperforming most regular LMs on several benchmarks.

Improving alignment of dialogue agents via targeted human judgements

Sparks, an information-seeking dialogue agent trained to be more helpful, correct, and harmless compared to prompted language model baselines, is presented and it is demonstrated that though the model learns to follow the authors' rules it can exhibit distributional biases.

A Survey on Retrieval-Augmented Text Generation for Large Language Models

This study aims to consolidate existing research on RAG, clarify its technological underpinnings, and highlight its potential to broaden the adaptability and applications of LLMs, as well as introducing evaluation methods for RAG.

Training Language Models to Generate Text with Citations via Fine-grained Rewards

This work proposes an effective training framework using fine-grained rewards to teach LLMs to generate highly supportive and relevant citations, while ensuring the correctness of their responses, and conducts a systematic analysis of applying these fine-grained rewards to common LLM training strategies, demonstrating its advantage over conventional practices.

AI capabilities can be significantly improved without expensive retraining

This work reviews recent post-training enhancements, categorizing them into five types: tool-use, prompting methods, scaffolding, solution selection, and data generation, and translates improvements from different enhancements into a common currency, the compute-equivalent gain.

A Survey of Large Language Models Attribution

The aim of this survey is to provide valuable insights for researchers, aiding in the refinement of attribution methodologies to enhance the reliability and veracity of responses generated by open-domain generative systems.

WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences

This work identifies and addresses the limitations of WebGPT (OpenAI), through which WebGLM is enabled with accuracy, efficiency, and cost-effectiveness advantages, and proposes systematic criteria for evaluating web-enhanced QA systems.
...

TruthfulQA: Measuring How Models Mimic Human Falsehoods

It is suggested that scaling up models alone is less promising for improving truthfulness than fine-tuning using training objectives other than imitation of text from the web.

REALM: Retrieval-Augmented Language Model Pre-Training

The effectiveness of Retrieval-Augmented Language Model pre-training (REALM) is demonstrated by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA) and is found to outperform all previous methods by a significant margin, while also providing qualitative benefits such as interpretability and modularity.

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

It is shown that, in comparison to other recently introduced large-scale datasets, TriviaQA has relatively complex, compositional questions, has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and requires more cross sentence reasoning to find answers.

Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets

A detailed study of the test sets of three popular open-domain benchmark datasets finds that 30% of test-set questions have a near-duplicate paraphrase in their corresponding train sets, and that simple nearest-neighbor models outperform a BART closed-book QA model.

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

A general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation, and finds that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.

Truthful AI: Developing and governing AI that does not lie

Differences between AI and humans present an opportunity to have more precise standards of truthfulness for AI, and to have these standards rise over time, could provide significant benefits to public epistemics and the economy, and mitigate risks of worst-case AI futures.

Boosting Search Engines with Interactive Agents

This paper presents first successful steps in designing search agents that learn meta-strategies for iterative query refinement in information-seeking tasks, and develops a novel way of generating synthetic search sessions that leverages the power of transformer-based language models through (self-)supervised learning.

Retrieval Augmentation Reduces Hallucination in Conversation

The use of neural-retrieval-in-the-loop architectures - recently shown to be effective in open-domain QA - is explored for knowledge-grounded dialogue, a task that is arguably more challenging as it requires querying based on complex multi-turn dialogue context and generating conversationally coherent responses.

Hurdles to Progress in Long-form Question Answering

The task formulation raises fundamental challenges regarding evaluation and dataset creation that currently preclude meaningful modeling progress, and a new system that relies on sparse attention and contrastive retriever learning to achieve state-of-the-art performance on the ELI5 LFQA dataset is designed.