Grounding ‘Grounding’ in NLP

  title={Grounding ‘Grounding’ in NLP},
  author={Khyathi Raghavi Chandu and Yonatan Bisk and Alan W. Black},
The NLP community has seen substantial recent interest in grounding to facilitate interaction between language technologies and the world. However, as a community, we use the term broadly to reference any linking of text to data or non-textual modality. In contrast, Cognitive Science more formally defines “grounding” as the process of establishing what mutual information is required for successful communication between two interlocutors – a definition which might implicitly capture the NLP… 
Multilingual Event Linking to Wikidata
Qualitative analysis highlighting various aspects captured by the proposed dataset, including the need for temporal reasoning over context and tackling diverse event descriptions across languages are presented.
Ultra-fine Entity Typing with Indirect Supervision from Natural Language Inference
LITE is presented, a new approach that formulates entity typing as a natural language inference (NLI) problem, making use of the indirect supervision from NLI to infer type information meaningfully represented as textual hypotheses and alleviate the data scarcity issue.
Knowledge-Grounded Self-Rationalization via Extractive and Natural Language Explanations
This work introduces RE X C, a self-rationalizing framework that grounds its predictions and two complementary types of explanations (NLEs and extractive rationales) in background knowledge, and improves over previous methods by reaching SOTA task performance while also providing explanations.
Summarizing a virtual robot's past actions in natural language
It is shown how a popular existing dataset that matches robot actions with natural language descriptions designed for an instruction following task can be repurposed to serve as a training ground for robot action summarization work.
Do BERTs Learn to Use Browser User Interface? Exploring Multi-Step Tasks with Unified Vision-and-Language BERTs
A framework in which a model performs a task by manipulating the GUI implemented with web pages in multiple steps is explored, suggesting that BERTs can be fine-tune to multi-step tasks through GUIs, and there is room for improvement in their generalizability.
Toward robots that learn to summarize their actions in natural language: a set of tasks
An initial framework for robot action summarization is presented as a set of tasks which can serve as a target for research and a measure of progress.
Dialogue Collection for Recording the Process of Building Common Ground in a Collaborative Task
To develop a dialogue system that can build common ground with users, the process of building common ground through dialogue needs to be clarified. However, the studies on the process of building
Conversational Grounding as Natural Language Supervision – the need for divergent agent data
It is argued that a key factor holding back research in this area is lack of appropriate data on tasks with divergent agents, which can resolve disagreements and errors, and it is proposed requirements and methods for new data collections enabling such work.
Unaware of Reality: Inconsistent Grounding in Conversational AI Anonymous ACL
  • Computer Science
  • 2022
Two ways of interpreting the (in)consistency of conversational agents’ responses are discussed, which are called horizontal and vertical consistency.


Commonsense Reasoning for Natural Language Processing
This tutorial organizes this tutorial to provide researchers with the critical foundations and recent advances in commonsense representation and reasoning, in the hopes of casting a brighter light on this promising area of future research.
Unsupervised Commonsense Question Answering with Self-Talk
An unsupervised framework based on self-talk as a novel alternative to multiple-choice commonsense tasks, inspired by inquiry-based discovery learning, which improves performance on several benchmarks and competes with models that obtain knowledge from external KBs.
Grounded Textual Entailment
This paper argues for a visually-grounded version of the Textual Entailment task, and asks whether models can perform better if, in addition to P and H, there is also an image (corresponding to the relevant “world” or “situation”).
Words Aren’t Enough, Their Order Matters: On the Robustness of Grounding Visual Referring Expressions
This work critically examines RefCOCOg, a standard benchmark for this task, using a human study and shows that 83.7% of test instances do not require reasoning on linguistic structure, i.e., words are enough to identify the target object, the word order doesn’t matter.
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
This work develops an algorithmic approach that generates an indirect supervision signal it uses to train a ranking model that accurately chooses knowledge base entries for a given mention, and shows that considering multiple knowledge bases together has an advantage over grounding concepts to each knowledge base individually.
Grounded Compositional Outputs for Adaptive Language Modeling
This work proposes a fully compositional output embedding layer for language models, which is further grounded in information from a structured lexicon (WordNet), namely semantically related words and free-text definitions, and is the first word-level language model with a size that does not depend on the training vocabulary.
ClarQ: A large-scale and diverse dataset for Clarification Question Generation
A novel bootstrapping framework (based on self-supervision) that assists in the creation of a diverse, large-scale dataset of clarification questions based on post-comment tuples extracted from stackexchange, using a neural network based architecture for classifying clarification questions.
Grounded Semantic Role Labeling
This paper extends traditional SRL to grounded SRL where arguments of verbs are grounded to participants of actions in the physical world and grounds implicit roles that are not explicitly mentioned in language descriptions.
Visually Grounded Compound PCFGs
This work studies visually grounded grammar induction and learns a constituency parser from both unlabeled text and its visual groundings, and shows that using an extension of probabilistic context-free grammar model, it can do fully-differentiable end-to-end visually grounded learning.
Multi-Resolution Language Grounding with Weak Supervision
An approach to multi-resolution language grounding in the extremely challenging domain of professional soccer commentaries is introduced and a factored objective function is defined that allows us to leverage discourse structure and the compositional nature of both language and game events.