SciREX: A Challenge Dataset for Document-Level Information Extraction

@inproceedings{Jain2020SciREXAC,
  title={SciREX: A Challenge Dataset for Document-Level Information Extraction},
  author={Sarthak Jain and Madeleine van Zuylen and Hannaneh Hajishirzi and Iz Beltagy},
  booktitle={ACL},
  year={2020}
}
Extracting information from full documents is an important problem in many domains, but most previous work focus on identifying relationships within a sentence or a paragraph. It is challenging to create a large-scale information extraction (IE) dataset at the document level since it requires an understanding of the whole document to annotate entities and their document-level relationships that usually span beyond sentences or even sections. In this paper, we introduce SciREX, a document level… Expand
Document-level Entity-based Extraction as Template Generation
TLDR
A generative framework is proposed for two document-level EE tasks: role-filler entity extraction (REE) and relation extraction (RE), allowing models to efficiently capture crossentity dependencies, exploit label semantics, and avoid the exponential computation complexity of identifying N-ary relations. Expand
Joint Detection and Coreference Resolution of Entities and Events with Document-level Context Aggregation
TLDR
This paper proposes a new jointly trained model that can be used for various information extraction tasks at the document level, and evaluates the system on documents from the ACE05-E dataset and finds significant improvement over the sentence-level state of theart on entity extraction and event detection. Expand
ArgFuse: A Weakly-Supervised Framework for Document-Level Event Argument Aggregation
TLDR
An extractive algorithm with multiple sieves which adopts active learning strategies to work efficiently in low-resource settings and is the first to establish baseline results for this task in English. Expand
Efficient End-to-end Learning of Cross-event Dependencies for Document-level Event Extraction
TLDR
This paper proposes an end-to-end model leveraging Deep Value Networks (DVN), a structured prediction algorithm, to efficiently capture cross-event dependencies for document-level event extraction and achieves comparable performance to CRF-based model on ACE05, while enjoys significantly higher efficiency. Expand
Fine-Grained Relevance Annotations for Multi-Task Document Ranking and Question Answering
TLDR
This work extends the ranked retrieval annotations of the Deep Learning track of TREC 2019 with passage and word level graded relevance annotations for all relevant documents, and presents FiRA: a novel dataset of Fine-Grained Relevance Annotations. Expand
Document-level Event Extraction with Efficient End-to-end Learning of Cross-event Dependencies
TLDR
This paper proposes an end-to-end model leveraging Deep Value Networks (DVN), a structured prediction algorithm, to efficiently capture cross-event dependencies for document-level event extraction that achieves comparable performance to CRF-based models on ACE05, while enjoys significantly higher computational efficiency. Expand
TDMSci: A Specialized Corpus for Scientific Literature Entity Tagging of Tasks Datasets and Metrics
TLDR
A new corpus that contains domain expert annotations for Task (T), Dataset (D), Metric (M) entities 2,000 sentences extracted from NLP papers is presented and made publicly available to the community. Expand
A Feature Combination-Based Graph Convolutional Neural Network Model for Relation Extraction
TLDR
A feature combination-based graph convolutional neural network model (FC-GCN) is proposed that has the advantages of encoding structural information of a sentence, considering prior knowledge, and avoiding errors caused by parsing. Expand
Deep Neural Approaches to Relation Triplets Extraction: A Comprehensive Survey
TLDR
This survey focuses on relation extraction using deep neural networks which have achieved state-of-the-art performance on publicly available datasets, and covers sentence-level relation extraction to document- level relation extraction, pipeline-based approaches to joint extraction approaches, annotated datasets to distantly supervised datasets. Expand
SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts
TLDR
This work presents a new task of hierarchical CDCR for concepts in scientific papers, with the goal of jointly inferring coreference clusters and hierarchy between them and creates SCICO, an expert-annotated dataset for this task. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 26 REFERENCES
DocRED: A Large-Scale Document-Level Relation Extraction Dataset
TLDR
Empirical results show that DocRED is challenging for existing RE methods, which indicates that document-level RE remains an open problem and requires further efforts. Expand
Document-Level N-ary Relation Extraction with Multiscale Representation Learning
TLDR
This paper proposes a novel multiscale neural architecture for document-level n-ary relation extraction that combines representations learned over various text spans throughout the document and across the subrelation hierarchy. Expand
Entity, Relation, and Event Extraction with Contextualized Span Representations
TLDR
This work examines the capabilities of a unified, multi-task framework for three information extraction tasks: named entity recognition, relation extraction, and event extraction (called DyGIE++) and achieves state-of-the-art results across all tasks. Expand
Modeling Relations and Their Mentions without Labeled Text
TLDR
A novel approach to distant supervision that can alleviate the problem of noisy patterns that hurt precision by using a factor graph and applying constraint-driven semi-supervision to train this model without any knowledge about which sentences express the relations in the authors' training KB. Expand
Supervised Open Information Extraction
TLDR
A novel formulation of Open IE as a sequence tagging problem, addressing challenges such as encoding multiple extractions for a predicate, and a supervised model that outperforms the existing state-of-the-art Open IE systems on benchmark datasets. Expand
SciBERT: A Pretrained Language Model for Scientific Text
TLDR
SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT. Expand
Dependency-Guided LSTM-CRF for Named Entity Recognition
TLDR
This work proposes a simple yet effective dependency-guided LSTM-CRF model to encode the complete dependency trees and capture the above properties for the task of named entity recognition (NER). Expand
Position-aware Attention and Supervised Data Improve Slot Filling
TLDR
An effective new model is proposed, which combines an LSTM sequence model with a form of entity position-aware attention that is better suited to relation extraction that builds TACRED, a large supervised relation extraction dataset obtained via crowdsourcing and targeted towards TAC KBP relations. Expand
Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction
TLDR
The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links and supports construction of a scientific knowledge graph, which is used to analyze information in scientific literature. Expand
End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures
TLDR
A novel end-to-end neural model to extract entities and relations between them and compares favorably to the state-of-the-art CNN based model (in F1-score) on nominal relation classification (SemEval-2010 Task 8). Expand
...
1
2
3
...