• Corpus ID: 230437704

Cross-Document Language Modeling

  title={Cross-Document Language Modeling},
  author={Avi Caciularu and Arman Cohan and Iz Beltagy and Matthew E. Peters and Arie Cattan and Ido Dagan},
We introduce a new pretraining approach for language models that are geared to support multi-document NLP tasks. Our crossdocument language model (CD-LM) improves masked language modeling for these tasks with two key ideas. First, we pretrain with multiple related documents in a single input, via cross-document masking, which encourages the model to learn cross-document and long-range relationships. Second, extending the recent Longformer model, we pretrain with long contexts of several… 

Figures and Tables from this paper

Sequential Cross-Document Coreference Resolution
A new model is proposed that extends the efficient sequential prediction paradigm for coreference resolution to cross- document settings and achieves competitive results for both entity and event coreference while providing strong evidence of the efficacy of both sequential models and higher-order inference in cross-document settings.
PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
A pre-trained model for multi-document representation with focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data and outperforms current state-of-the-art models on most of these settings with large margins.
SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts
This work presents a new task of hierarchical CDCR for concepts in scientific papers, with the goal of jointly inferring coreference clusters and hierarchy between them and creates SCICO, an expert-annotated dataset for this task.
Cross-document Coreference Resolution over Predicted Mentions
This work introduces the first end-to-end model for CD coreference resolution from raw text, which extends the prominent model for withindocument coreference to the CD setting and achieves competitive results for event and entity coreferenceresolution on gold mentions.
Focus on what matters: Applying Discourse Coherence Theory to Cross Document Coreference
This work model the entities/events in a reader’s focus as a neighborhood within a learned latent embedding space which minimizes the distance between mentions and the centroids of their gold coreference clusters, leading to a robust coreference resolution model that is now feasible to apply to downstream tasks.
XCoref: Cross-document Coreference Resolution in the Wild
Outperforming an established CDCR model shows that the new CDCR models need to be evaluated on semantically complex mentions with more loose coreference relations to indicate their applicability of models to resolve mentions in the “wild” of political news articles.
Representation Learning via Variational Bayesian Networks
We present Variational Bayesian Network (VBN) - a novel Bayesian entity representation learning model that utilizes hierarchical and relational side information and is particularly useful for
Citation Recommendation for Research Papers via Knowledge Graphs
The experimental results demonstrate that the combination of information from research KGs with existing state-of-the-art approaches is beneficial and outperforms the state of the art with a mean average precision of 20.6% (+0.8) for the top-50 retrieved results.
Rethinking Search: Making Experts out of Dilettantes
This paper examines how ideas from classical information retrieval and large pre-trained language models can be synthesized and evolved into systems that truly deliver on the promise of expert advice.


Multilevel Text Alignment with Cross-Document Attention
This work proposes a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component, enabling structural comparisons across different levels (document-to-document and sentence- to-document).
Pre-training via Paraphrasing
It is shown that fine-tuning gives strong performance on a range of discriminative and generative tasks in many languages, making MARGE the most generally applicable pre-training method to date.
Revisiting Joint Modeling of Cross-document Entity and Event Coreference Resolution
This work jointly model entity and event coreference, and proposes a neural architecture for cross-document coreference resolution using its lexical span, surrounding context, and relation to entity (event) mentions via predicate-arguments structures.
Semantic Text Matching for Long-Form Documents
This paper proposes a novel Siamese multi-depth attention-based hierarchical recurrent neural network (SMASH RNN) that learns the long-form semantics, and enables long- form document based semantic text matching.
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.
Hierarchical Document Encoder for Parallel Corpus Mining
The results show document embeddings derived from sentence-level averaging are surprisingly effective for clean datasets, but suggest models trained hierarchically at the document-level are more effective on noisy data.
Hierarchical Transformers for Multi-Document Summarization
A neural summarization model which can effectively process multiple input documents and distill Transformer architecture with the ability to encode documents in a hierarchical manner is developed.
"Bag of Events" Approach to Event Coreference Resolution. Supervised Classification of Event Templates
We propose a new robust two-step approach to cross-textual event coreference resolution on news articles. The approach makes explicit use of event and discourse structure thereby compensating for
Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model
This work introduces Multi-News, the first large-scale MDS news dataset, and proposes an end-to-end model which incorporates a traditional extractive summarization model with a standard SDS model and achieves competitive results on MDS datasets.
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.