Streamlining Cross-Document Coreference Resolution: Evaluation and Modeling
- Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan
- Computer ScienceArXiv
- 23 September 2020
This work proposes a pragmatic evaluation methodology which assumes access to only raw text -- rather than assuming gold mentions, disregards singleton prediction, and addresses typical targeted settings in CD coreference resolution.
WEC: Deriving a Large-scale Cross-document Event Coreference dataset from Wikipedia
- Alon Eirew, Arie Cattan, Ido Dagan
- Computer ScienceNorth American Chapter of the Association for…
- 11 April 2021
Wikipedia Event Coreference (WEC) is presented, an efficient methodology for gathering a large-scale dataset for cross-document event coreference from Wikipedia, where coreference links are not restricted within predefined topics.
Cross-Document Language Modeling
- Avi Caciularu, Arman Cohan, Iz Beltagy, Matthew E. Peters, Arie Cattan, Ido Dagan
- Computer ScienceArXiv
- 2021
The crossdocument language model (CD-LM) improves masked language modeling for multi-document NLP tasks with two key ideas, including pretraining with multiple related documents in a single input, via cross-document masking, which encourages the model to learn cross- document and long-range relationships.
CDLM: Cross-Document Language Modeling
- Avi Caciularu, Arman Cohan, Iz Beltagy, Matthew E. Peters, Arie Cattan, Ido Dagan
- Computer ScienceConference on Empirical Methods in Natural…
- 2 January 2021
This work introduces a new pretraining approach geared for multi-document language modeling, incorporating two key ideas into the masked language modeling self-supervised objective: pretrain over sets of multiple related documents, encouraging the model to learn cross-document relationships.
Cross-document Coreference Resolution over Predicted Mentions
- Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan
- Computer ScienceFindings
- 2 June 2021
This work introduces the first end-to-end model for CD coreference resolution from raw text, which extends the prominent model for withindocument coreference to the CD setting and achieves competitive results for event and entity coreferenceresolution on gold mentions.
Realistic Evaluation Principles for Cross-document Coreference Resolution
- Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, Ido Dagan
- Computer ScienceSTARSEM
- 8 June 2021
It is argued that models should not exploit the synthetic topic structure of the standard ECB+ dataset, forcing models to confront the lexical ambiguity challenge, as intended by the dataset creators.
SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts
- Arie Cattan, Sophie Johnson, Tom Hope
- Computer ScienceConference on Automated Knowledge Base…
- 18 April 2021
This work presents a new task of hierarchical CDCR for concepts in scientific papers, with the goal of jointly inferring coreference clusters and hierarchy between them and creates SCICO, an expert-annotated dataset for this task.
CoRefi: A Crowd Sourcing Suite for Coreference Annotation
- A. Bornstein, Arie Cattan, Ido Dagan
- Computer ScienceConference on Empirical Methods in Natural…
- 1 October 2020
CoRefi is a web-based coreference annotation suite, oriented for crowdsourcing, that provides guided onboarding for the task as well as a novel algorithm for a reviewing phase.
CDˆ2CR: Co-reference resolution across documents and domains
- James Ravenscroft, Arie Cattan, A. Clare, Ido Dagan, Maria Liakata
- Computer ScienceConference of the European Chapter of the…
- 29 January 2021
It is shown that in this cross-domain, cross-document setting, existing CDCR models do not perform well and a baseline model is provided that outperforms current state-of-the-artCDCR models on CDˆ2CR.
How “Multi” is Multi-Document Summarization?
- Ruben Wolhandler, Arie Cattan, Ori Ernst, Ido Dagan
- Computer ScienceConference on Empirical Methods in Natural…
- 23 October 2022
This paper proposes an automated measure for evaluating the degree to which a summary is “disperse”, in the sense of the number of source documents needed to cover its content, and applies this measure to empirically analyze several popular MDS datasets.
...
...