• Corpus ID: 10585087

Automatic Domain Adaptation for Parsing

  title={Automatic Domain Adaptation for Parsing},
  author={David McClosky and Eugene Charniak and Mark Johnson},
Current statistical parsers tend to perform well only on their training domain and nearby genres. While strong performance on a few related domains is sufficient for many situations, it is advantageous for parsers to be able to generalize to a wide variety of domains. When parsing document collections involving heterogeneous domains (e.g. the web), the optimal parsing model for each document is typically not obvious. We study this problem as a new task --- multiple source parser adaptation. Our… 

Figures and Tables from this paper

Any domain parsing: automatic domain adaptation for natural language parsing
A technique is presented, Any Domain Parsing, which automatically detects useful source domains and mixes them together to produce a customized parsing model which performs almost as well as the best seen parsing models (oracle) for each target domain.
Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples
It is shown that recent advances in word representations greatly diminish the need for domain adaptation when the target domain is syntactically similar to the source domain, and a simple way to adapt a parser using only dozens of partial annotations is provided.
A word clustering approach to domain adaptation: Robust parsing of source and target domains
A technique to improve out-of-domain statistical parsing by reducing lexical data sparseness in a PCFG-LA architecture is presented, and an interesting result is that the proposed techniques also improve parsing performance on the source source, contrary to techniques such as self-training, thus leading to a more ro- bust parser overall.
Effective Measures of Domain Similarity for Parsing
An unsupervised technique based on topic models is effective -- it outperforms random data selection on both languages examined, English and Dutch and works better than manually assigned labels gathered from meta-data that is available for English.
Challenges to Open-Domain Constituency Parsing
This work analyzes challenges to open-domain constituency parsing using a set of linguistic features on various strong constituency parsers and finds that BERT significantly increases parsers’ cross-domain performance by reducing their sensitivity on the domain-variant features.
Semi-supervised Domain Adaptation for Dependency Parsing
A simple domain embedding approach is proposed to merge the source- and target-domain training data, which is shown to be more effective than both direct corpus concatenation and multi-task learning and shows that a simple fine-tuning procedure can further boost cross-domain parsing accuracy by large margin.
Data point selection for genre-aware parsing
This paper investigates how differences between articles in a newspaper corpus relate to the concepts of genre and domain and how they influence parsing performance of a transition-based dependency parser by applying various similarity measures for data point selection and testing their adequacy for creating genre-aware parsing models.
Cross-Domain Effects on Parse Selection for Precision Grammars
It is found it is possible to considerably improve parse selection accuracy through construction of even small-scale in- domain treebanks, and learning of parse selection models over in-domain and out-of-domain data, and more sophisticated strategies for combining data from these sources to train models are investigated.
Minimally Supervised Domain-Adaptive Parse Reranking for Relation Extraction
The paper demonstrates how the generic parser of a minimally supervised information extraction framework can be adapted to a given task and domain for relation extraction (RE) and acquired reranking model improves the performance of RE in both training and test phases with the new first parses.
Learning Domain Invariant Word Representations for Parsing Domain Adaptation
It is shown that strong domain adaptation results for dependency parsing can be achieved using a conceptually simple method that learns domain-invariant word representations by fine-tuning pretrained word representations adversarially.


The Domain Dependence of Parsing
Comparison of structure distributions across domains; examples of domain specific structures; and Parsing experiment using some domain dependent grammars demonstrate domain dependence and idiosyncrasy of syntactic structure.
Reranking and Self-Training for Parser Adaptation
The reranking parser described in Charniak and Johnson (2005) improves performance of the parser on Brown to 85.2% and use of the self-training techniques described in (McClosky et al., 2006) raise this to 87.8% (an error reduction of 28%) again without any use of labeled Brown data.
Learning Reliability of Parses for Domain Adaptation of Dependency Parsing
The goal was to improve the performance of a state-of-the-art dependency parser on the data set of the domain adaptation track of the CoNLL 2007 shared task, a formidable challenge.
Parser Evaluation and the BNC: Evaluating 4 constituency parsers with 3 metrics
This work evaluates discriminative parse reranking and parser self-training on a new English test set using four versions of the Charniak parser and a variety of parser evaluation metrics and finds that reranking leads to a performance improvement on the new test set (albeit a modest one).
Subdomain Sensitive Statistical Parsing using Raw Corpora
This paper presents a method that exploits raw subdomain corpora gathered from the web to introduce subdomain sensitivity into a given parser, and employs statistical techniques for creating an ensemble of domain sensitive parsers, and explores methods for amalgamating their predictions.
Corpus Variation and Parser Performance
This work examines how other types of text might a ect parser performance, and how portable parsing models are across corpora by comparing results for the Brown and WSJ corpora, and considers which parts of the parser's probability model are particularly tuned to the corpus on which it was trained.
Automatic Prediction of Parser Accuracy
This paper proposes a technique that automatically takes into account certain characteristics of the domains of interest, and accurately predicts parser performance on data from these new domains, and has a cheap and effective recipe for measuring the performance of a statistical parser on any given domain.
An Empirical Study of Semi-supervised Structured Conditional Models for Dependency Parsing
The effectiveness of the proposed methods on dependency parsing experiments using two widely used test collections: the Penn Treebank for English, and the Prague Dependency Tree-bank for Czech are demonstrated.
Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking
This paper describes a simple yet novel method for constructing sets of 50- best parses based on a coarse-to-fine generative parser that generates 50-best lists that are of substantially higher quality than previously obtainable.
Head-Driven Statistical Models for Natural Language Parsing
  • M. Collins
  • Computer Science
    Computational Linguistics
  • 2003
Three statistical models for natural language parsing are described, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree.