• Corpus ID: 14190520

Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

@inproceedings{Shindo2012BayesianST,
  title={Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing},
  author={Hiroyuki Shindo and Yusuke Miyao and Akinori Fujino and Masaaki Nagata},
  booktitle={ACL},
  year={2012}
}
We propose Symbol-Refined Tree Substitution Grammars (SR-TSGs) for syntactic parsing. An SR-TSG is an extension of the conventional TSG model where each nonterminal symbol can be refined (subcategorized) to fit the training data. We aim to provide a unified model where TSG rules and symbol refinement are learned from training data in a fully automatic and consistent fashion. We present a novel probabilistic SR-TSG model based on the hierarchical Pitman-Yor Process to encode backoff smoothing… 

Figures and Tables from this paper

Statistical Parsing with Probabilistic Symbol-Refined Tree Substitution Grammars
TLDR
The probabilistic model is consistent based on the hierarchical Pitman-Yor Process to encode backoff smoothing from a fine-grained SR-TSG to simpler CFG rules, thus all grammar rules can be learned from training data in a fully automatic fashion.
Bayesian Tree Substitution Grammars as a Usage-based Approach
TLDR
This work describes a model-based approach that learns a TSG using Gibbs sampling with a non-parametric prior to control fragment size, yielding grammars that contain mostly small fragments but that include larger ones as the data permits.
Nonparametric Bayesian Inference and Efficient Parsing for Tree-adjoining Grammars
TLDR
This work presents a Bayesian nonparametric model for estimating a probabilistic TAG from a parsed corpus, along with novel block sampling methods and approximation transformations for TAG that allow efficient parsing.
Discontinuous Parsing with an Efficient and Accurate DOP Model
We present a discontinuous variant of tree-substitution grammar (tsg) based on Linear Context-Free Rewriting Systems. We use this formalism to instantiate a Data-Oriented Parsing model applied to
Data-Oriented Parsing with Discontinuous Constituents and Function Tags
TLDR
The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis.
Smoothing for Bracketing Induction
TLDR
This paper proposes to define a non-parametric Bayesian prior distribution, namely the Pitman-Yor Process (PYP) prior, over constituents for constituent smoothing, and finds that two kinds of HSS are effective, attaining or significantly improving the state-of-the-art performance of the bracketing induction evaluated on standard treebanks of various languages.
Bayesian Constituent Context Model for Grammar Induction
TLDR
Experiments show that both the proposed Bayesian smoothing method and the modified CCM are effective, and combining them attains or significantly improves the state-of-the-art performance of grammar induction evaluated on standard treebanks of various languages.
Parsing low-resource languages using Gibbs sampling for PCFGs with latent annotations
TLDR
It is shown that a Gibbs sampling technique is capable of parsing sentences in a wide variety of languages and producing results that are on-par with or surpass previous approaches.
2 Constituent Context Model ( CCM )
TLDR
This paper proposes to define a non-parametric Bayesian prior distribution, namely the Pitman-Yor Process (PYP) prior, over constituents for constituent smoothing, and finds that two kinds of HSS are effective, attaining or significantly improving the state-ofthe-art performance of the bracketing induction evaluated on standard treebanks of various languages.
In-Order Transition-based Constituent Parsing
TLDR
A novel parsing system based on in-order traversal over syntactic trees, designing a set of transition actions to find a compromise between bottom-up constituent information and top-down lookahead information is proposed.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 32 REFERENCES
Simple, Accurate Parsing with an All-Fragments Grammar
TLDR
A simple but accurate parser which exploits both large tree fragments and symbol refinement, and achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-the-art lexicalized and latent-variable parsers.
Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
TLDR
This work formalizes nonparametric Bayesian STSG with epsilon alignment in full generality, and provides a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression.
Head-Driven Statistical Models for Natural Language Parsing
  • M. Collins
  • Computer Science
    Computational Linguistics
  • 2003
TLDR
Three statistical models for natural language parsing are described, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree.
Parsimonious Data-Oriented Parsing
TLDR
A parsimonious approach to Data-Oriented Parsing, that is formulated as an enrichment of the treebank Probabilistic Context-free Grammar (PCFG), it allows for much easier comparison to alternative approaches to statistical parsing.
Inducing Tree-Substitution Grammars
TLDR
This work proposes a novel compromise by inferring a probabilistic tree substitution grammar, a formalism which allows for arbitrarily large tree fragments and thereby better represent complex linguistic structures and demonstrates the model's efficacy on supervised phrase-structure parsing and unsupervised dependency grammar induction.
Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing
TLDR
This paper defines a hierarchical non-parametric Pitman-Yor Process prior which biases towards a small grammar with simple productions and significantly improves the state-of-the-art, when measured by head attachment accuracy.
Learning Accurate, Compact, and Interpretable Tree Annotation
We present an automatic approach to tree annotation in which basic nonterminal symbols are alternately split and merged to maximize the likelihood of a training treebank. Starting with a simple X-bar
Probabilistic CFG with Latent Annotations
This paper defines a generative probabilistic model of parse trees, which we call PCFG-LA. This model is an extension of PCFG in which non-terminal symbols are augmented with latent variables.
K-Best Combination of Syntactic Parsers
TLDR
A linear model-based general framework to combine k-best parse outputs from multiple parsers by integrating them into a linear model able to fully utilize both the logarithm of the probability of each k- best parse tree from each individual parser and any additional useful features.
Bayesian Learning of a Tree Substitution Grammar
TLDR
This paper learns a TSG using Gibbs sampling with a nonparametric prior to control subtree size and the learned grammars perform significantly better than heuristically extracted ones on parsing accuracy.
...
1
2
3
4
...