• Corpus ID: 211010521

Citation Text Generation

  title={Citation Text Generation},
  author={Kelvin Luu and Rik Koncel-Kedziorski and Kyle Lo and Isabel Cachola and Noah A. Smith},
We introduce the task of citation text generation: given a pair of scientific documents, explain their relationship in natural language text in the manner of a citation from one text to the other. This task encourages systems to learn rich relationships between scientific texts and to express them concretely in natural language. Models for citation text generation will require robust document understanding including the capacity to quickly adapt to new vocabulary and to reason about document… 

Figures and Tables from this paper

Towards Generating Citation Sentences for Multiple References with Intent Control

This work builds a novel generation model with the Fusion-in-Decoder approach to cope with multiple long inputs and incorporates the predicted citation intents into training for intent control, and releases a newly collected dataset named CiteMI to drive the future research.

Evaluation of Text Generation: A Survey

This paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years, with a focus on the evaluation of recently proposed NLG tasks and neural NLG models.

Task Definition and Integration For Scientific-Document Writing Support

This paper defines a series of tasks related to scientific-document writing that can be pipelined and evaluates the tasks of citation worthiness and citation recommendation as well as both of these tasks integrated, showing that the proposed approach is promising.

Can We Automate Scientific Reviewing?

The conclusion is that the technology is not yet ready for use in high-stakes review settings, and the generated texts are less constructive and less factual than human-written reviews for all aspects except the explanation of the core ideas of the papers, which are largely factually correct.

TLDR: Extreme Summarization of Scientific Documents

This work introduces SCITLDR, a new multi-target dataset of 5.4K TLDRs over 3.2K papers, and proposes CATTS, a simple yet effective learning strategy for generatingTLDRs that exploits titles as an auxiliary training signal.

Extracting Summary Knowledge Graphs from Long Documents

A new text-to-graph task of predicting summarized knowledge graphs from long documents is introduced using a dataset of 200k document/graph pairs using automatic and human annotations and strong baselines are developed.

KID-Review: Knowledge-Guided Scientific Review Generation with Oracle Pre-training

An end-to-end knowledge-guided review generation framework for scientific papers grounded in cognitive psychology research that a better understanding of text requires different types of knowledge and an oracle pre-training strategy which can make the Kid-Review better educated and make the generated review cover more aspects.

ReviewRobot: Explainable Paper Review Generation based on Knowledge Synthesis

A novel ReviewRobot is built to automatically assign a review score and write comments for multiple categories such as novelty and meaningful comparison, and can serve as an assistant for paper reviewers, program chairs and authors.

Automated Lay Language Summarization of Biomedical Scientific Reviews

Analysis of the various challenges in performing the automated generation of lay language summaries of biomedical scientific reviews indicate that automatically generated summaries produced using contemporary neural architectures can achieve promising quality and readability as compared with references developed for the lay public by experts.

Building Dataset Textomics Dataset Tasks and Applications Vec 2 Text Text 2

  • Computer Science
  • 2021
Inspired by the successful applications of k nearest neighbors in modeling genomics data, a kNN-Vec2Text model is proposed to address two novel tasks: generating textual summary from genomicsData matrix and vice versa and substantial improvement on this dataset is observed.



Citances: Citation Sentences for Semantic Analysis of Bioscience Text

This work hypothesizes several different uses of citation sentences, including the creation of training and testing data for semantic analysis, synonym set creation, database curation, document summarization, and information retrieval generally.

ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks

The first large-scale manually-annotated corpus for scientific papers is developed and released by enabling faster annotation and summarization methods that integrate the authors’ original highlights and the article’s actual impacts on the community are proposed, to create comprehensive, hybrid summaries.

GORC: A large contextual citation graph of academic papers

We introduce the Semantic Scholar Graph of References in Context (GORC),1 a large contextual citation graph of 81.1M academic publications, including parsed full text for 8.1M open access papers,

Contextualizing Citations for Scientific Summarization using Word Embeddings and Domain Knowledge

An unsupervised model that uses distributed representation of words as well as domain knowledge to extract the appropriate context from the reference paper is proposed and demonstrated how an effective contextualization method results in improving citation-based summarization of the scientific articles.

Hierarchical Neural Story Generation

This work collects a large dataset of 300K human-written stories paired with writing prompts from an online forum that enables hierarchical story generation, where the model first generates a premise, and then transforms it into a passage of text.

Identifying Meaningful Citations

This work introduces the novel task of identifying important citations in scholarly literature, i.e., citations that indicate that the cited work is used or extended in the new effort, and proposes a supervised classification approach that addresses this task with a battery of features.

Structural Scaffolds for Citation Intent Classification in Scientific Publications

This work proposes structural scaffolds, a multitask model to incorporate structural information of scientific papers into citations for effective classification of citation intents, which achieves a new state-of-the-art on an existing ACL anthology dataset with a 13.3% absolute increase in F1 score.

Scientific Article Summarization Using Citation-Context and Article’s Discourse Structure

It is shown that the proposed summarization approach for scientific articles which takes advantage of citation-context and the document discourse model effectively improves over existing summarization approaches (greater than 30% improvement over the best performing baseline) in terms of ROUGE scores on TAC2014 scientific summarization dataset.

Citation Classification for Behavioral Analysis of a Scientific Field

It is demonstrated that authors are sensitive to discourse structure and publication venue when citing, that online readers follow temporal links to previous and future work rather than methodological links, and that how a paper cites related work is predictive of its citation count.

Content-Based Citation Recommendation

It is shown empirically that, although adding metadata improves the performance on standard metrics, it favors self-citations which are less useful in a citation recommendation setup and released an online portal for citation recommendation based on this method.