• Corpus ID: 239009533

DirectQuote: A Dataset for Direct Quotation Extraction and Attribution in News Articles

  title={DirectQuote: A Dataset for Direct Quotation Extraction and Attribution in News Articles},
  author={Yuanchi Zhang and Yang Liu},
  • Yuanchi Zhang, Yang Liu
  • Published 15 October 2021
  • Computer Science
  • ArXiv
Quotation extraction and attribution are challenging tasks, aiming at determining the spans containing quotations and attributing each quotation to the original speaker. Applying this task to news data is highly related to fact-checking, media monitoring and news tracking. Direct quotations are more traceable and informative, and therefore of great significance among different types of quotations. Therefore, this paper introduces DirectQuote, a corpus containing 19,760 paragraphs and 10,279… 

Figures and Tables from this paper


PARC 3.0: A Corpus of Attribution Relations
The annotation scheme was tested with an inter-annotator agreement study showing satisfactory results for the identification of ARs and high agreement on the selection of the text spans corresponding to its constitutive elements: source, cue and content.
An Attribution Relations Corpus for Political News
The Political News Attribution Relations Corpus 2016 (PolNeAR) is introduced—the largest, most complete attribution relations corpus to date and contributes revised guidelines aimed at improving clarity and consistency in the annotation task, and an annotation interface specially adapted to the task.
Understanding quotation extraction and attribution: towards automatic extraction of public figure’s statements for journalism in Indonesia
Purpose Extracting information from unstructured data becomes a challenging task for computational linguistics. Public figure’s statement attributed by journalists in a story is one type of
Quotation Extraction for Portuguese
This work presents a Quotation Extraction for Portuguese that is based on Entropy Guided Transformation Learning, a Machine Learning approach, and is the first system that uses aMachine Learning approach for Portuguese.
RiQuA: A Corpus of Rich Quotation Annotation for English Literary Text
RiQuA (RIch QUotation Annotations), a corpus that provides quotations, including their interpersonal structure (speakers and addressees) for English literary text, is introduced and publicly available for use, modification, and experimentation.
Opinion Mining on Newspaper Quotations
A comparative study on the methods and resources that can be employed for mining opinions from quotations in newspaper articles concludes that a generic opinion mining system requires both the use of large lexicons, as well as specialised training and testing data.
Whose story is it anyway? Automatic extraction of accounts from news articles
It is argued that a narrative may contain multiple accounts given by different actors, and a pipeline for automatically extracting accounts is presented, consisting of NLP methods for named entity recognition, event extraction, and attribution extraction.
Automatic Attribution of Quoted Speech in Literary Narrative
A method for identifying the speakers of quoted speech in natural-language textual stories by dividing the quotes into syntactic classes in order to leverage common discourse patterns, which enable rapid attribution for many quotes.
Sentiment Analysis in the News
This work distinguishes three different possible views on newspaper articles ― author, reader and text, which have to be addressed differently at the time of analysing sentiment, and presents work on mining opinions about entities in English language news.
MPQA 3.0: An Entity/Event-Level Sentiment Corpus
This paper presents an annotation scheme for adding entity and event target annotations to the MPQA corpus, a rich span-annotated opinion corpus, and describes the annotation scheme, and presents the results of an agreement study.