WebGPT: Browser-assisted question-answering with human feedback
@article{Nakano2021WebGPTBQ,
title={WebGPT: Browser-assisted question-answering with human feedback},
author={Reiichiro Nakano and Jacob Hilton and Suchir Balaji and Jeff Wu and Ouyang Long and Christina Kim and Christopher Hesse and Shantanu Jain and Vineet Kosaraju and William Saunders and Xu Jiang and Karl Cobbe and Tyna Eloundou and Gretchen Krueger and Kevin Button and Matthew Knight and Benjamin Chess and John Schulman},
journal={ArXiv},
year={2021},
volume={abs/2112.09332},
url={https://api.semanticscholar.org/CorpusID:245329531}
}GPT-3 is fine-tune to answer long-form questions using a text-based web-browsing environment, which allows the model to search and navigate the web, and the best model is obtained, which is preferred by humans 56% of the time to those of the authors' human demonstrators, and 69%" to the highest-voted answer from Reddit.
Figures and Tables from this paper
Topics
WebGPT (opens in a new tab)ELI5 (opens in a new tab)Web-browsing Environment (opens in a new tab)Reward Model (opens in a new tab)Human Feedback (opens in a new tab)Human Preferences (opens in a new tab)Long-form Questions (opens in a new tab)ELI5 Dataset (opens in a new tab)Explain Like I'm Five (opens in a new tab)Long Form Question Answering (opens in a new tab)
910 Citations
From natural language to simulations: applying AI to automate simulation modelling of logistics systems
- 2024
Computer Science
It is demonstrated that a framework constructed upon the refined GPT-3 Codex is capable of generating functionally valid simulations for queuing and inventory management systems when provided with a verbal explanation.
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
- 2023
Computer Science
This work shows that personalized alignment can be achieved by decomposing preferences into multiple dimensions based on personalizations that are declared as desirable by the user and can be efficiently trained independently in a distributed manner and combined effectively post-hoc through parameter merging.
Tool Learning with Foundation Models
- 2023
Computer Science, Education
A systematic investigation of tool learning is presented, including its cognitive origins, the paradigm shift of foundation models, and the complementary roles of tools and models, and a general tool learning framework is formulated.
Augmented Language Models: a Survey
- 2023
Computer Science, Linguistics
The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks and even outperforming most regular LMs on several benchmarks.
Improving alignment of dialogue agents via targeted human judgements
- 2022
Computer Science
Sparks, an information-seeking dialogue agent trained to be more helpful, correct, and harmless compared to prompted language model baselines, is presented and it is demonstrated that though the model learns to follow the authors' rules it can exhibit distributional biases.
A Survey on Retrieval-Augmented Text Generation for Large Language Models
- 2024
Computer Science
This study aims to consolidate existing research on RAG, clarify its technological underpinnings, and highlight its potential to broaden the adaptability and applications of LLMs, as well as introducing evaluation methods for RAG.
Training Language Models to Generate Text with Citations via Fine-grained Rewards
- 2024
Computer Science
This work proposes an effective training framework using fine-grained rewards to teach LLMs to generate highly supportive and relevant citations, while ensuring the correctness of their responses, and conducts a systematic analysis of applying these fine-grained rewards to common LLM training strategies, demonstrating its advantage over conventional practices.
AI capabilities can be significantly improved without expensive retraining
- 2023
Computer Science
This work reviews recent post-training enhancements, categorizing them into five types: tool-use, prompting methods, scaffolding, solution selection, and data generation, and translates improvements from different enhancements into a common currency, the compute-equivalent gain.
A Survey of Large Language Models Attribution
- 2023
Computer Science, Linguistics
The aim of this survey is to provide valuable insights for researchers, aiding in the refinement of attribution methodologies to enhance the reliability and veracity of responses generated by open-domain generative systems.
WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences
- 2023
Computer Science
This work identifies and addresses the limitations of WebGPT (OpenAI), through which WebGLM is enabled with accuracy, efficiency, and cost-effectiveness advantages, and proposes systematic criteria for evaluating web-enhanced QA systems.
44 References
TruthfulQA: Measuring How Models Mimic Human Falsehoods
- 2022
Computer Science, Linguistics
It is suggested that scaling up models alone is less promising for improving truthfulness than fine-tuning using training objectives other than imitation of text from the web.
REALM: Retrieval-Augmented Language Model Pre-Training
- 2020
Computer Science
The effectiveness of Retrieval-Augmented Language Model pre-training (REALM) is demonstrated by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA) and is found to outperform all previous methods by a significant margin, while also providing qualitative benefits such as interpretability and modularity.
Proximal Policy Optimization Algorithms
- 2017
Computer Science
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective…
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
- 2017
Computer Science, Linguistics
It is shown that, in comparison to other recently introduced large-scale datasets, TriviaQA has relatively complex, compositional questions, has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and requires more cross sentence reasoning to find answers.
Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets
- 2021
Computer Science
A detailed study of the test sets of three popular open-domain benchmark datasets finds that 30% of test-set questions have a near-duplicate paraphrase in their corresponding train sets, and that simple nearest-neighbor models outperform a BART closed-book QA model.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- 2020
Computer Science
A general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation, and finds that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
Truthful AI: Developing and governing AI that does not lie
- 2021
Philosophy, Computer Science
Differences between AI and humans present an opportunity to have more precise standards of truthfulness for AI, and to have these standards rise over time, could provide significant benefits to public epistemics and the economy, and mitigate risks of worst-case AI futures.
Boosting Search Engines with Interactive Agents
- 2022
Computer Science
This paper presents first successful steps in designing search agents that learn meta-strategies for iterative query refinement in information-seeking tasks, and develops a novel way of generating synthetic search sessions that leverages the power of transformer-based language models through (self-)supervised learning.
Retrieval Augmentation Reduces Hallucination in Conversation
- 2021
Computer Science
The use of neural-retrieval-in-the-loop architectures - recently shown to be effective in open-domain QA - is explored for knowledge-grounded dialogue, a task that is arguably more challenging as it requires querying based on complex multi-turn dialogue context and generating conversationally coherent responses.
Hurdles to Progress in Long-form Question Answering
- 2021
Computer Science
The task formulation raises fundamental challenges regarding evaluation and dataset creation that currently preclude meaningful modeling progress, and a new system that relies on sparse attention and contrastive retriever learning to achieve state-of-the-art performance on the ELI5 LFQA dataset is designed.



















