• Corpus ID: 11991205

Verb SCF extraction for Spanish with dependency parsing

@article{Padr2013VerbSE,
  title={Verb SCF extraction for Spanish with dependency parsing},
  author={Muntsa Padr{\'o} and N{\'u}ria Bel and Aina Gar{\'i} Soler},
  journal={Proces. del Leng. Natural},
  year={2013},
  volume={51},
  pages={93-100}
}
In this paper we present the results of our experiments in automatic production of verb subcategorization frame lexica for Spanish. The work was carried out in the framework of a project aiming at the automatic acquisition of lexical information reducing at maximum human intervention. In our experiments, a chain of different tools was used: domain focused web crawling, automatic cleaning, segmentation and tokenization, PoS tagging, dependency parsing and finally SCFs extraction. The obtained… 

Figures and Tables from this paper

Finding Dependency Parsing Limits over a Large Spanish Corpus

This work studies the performance of different parsers over a large Spanish treebank to select the most appropriate parser for Subcategorization Frame acquisition, and focuses on two aspects: the accuracy drop when parsing out-of-domain data, and the performance over specific labels relevant to the authors' task.

References

SHOWING 1-10 OF 34 REFERENCES

Automatic extraction of subcategorization frames for French

The results show that, contra (Korhonen et al. 2000), binomial hypothesis testing can be robust for determining subcategorization frames given corpus data and conclude that using a language resource with a currently unevaluated (and potentially high) error rate can yield robust results in conjunction with probabilistic filtering of the resource output.

Finding Dependency Parsing Limits over a Large Spanish Corpus

This work studies the performance of different parsers over a large Spanish treebank to select the most appropriate parser for Subcategorization Frame acquisition, and focuses on two aspects: the accuracy drop when parsing out-of-domain data, and the performance over specific labels relevant to the authors' task.

A Subcategorization Acquisition System for French Verbs

A system capable of automatically acquiring subcategorization frames (SCFs) for French verbs from the analysis of large corpora is presented and results are comparable with those reported in recent related work.

Towards a Semantic Classification of Spanish Verbs Based on Subcategorisation Information

Well-known techniques that have been developed for the English language to Spanish are applied, proving that empirical methods can be re-used through languages without substantial changes in the methodology.

Automatic extraction of subcategorization frames for Italian

The aim of this paper is to investigate the relationships between subcategorization frame extraction and the nature of data from which the frames have to be extracted, e.g. how much the task can be influenced by the richness/poorness of the annotation.

The IULA Treebank

How the used framework, the DELPH-IN processing framework, has been crucial in the design principles and in the bootstrapping strategy followed, especially in what refers to the use of stochastic modules for reducing parsing overgeneration is described.

Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II Treebank

A methodology for extracting subcategorisation frames based on an automatic LFG f-structure annotation algorithm for the Penn-II Treebank and a large-scale evaluation of the complete set of forms extracted against the full COMLEX resource is presented.

Inducing German Semantic Verb Classes from Purely Syntactic Subcategorisation Information

Using probability distributions over verb subcategorisation frames, an intuitively plausible clustering of 57 verbs into 14 classes is obtained, and the automatic clustering was evaluated against independently motivated, hand-constructed semantic verb classes.

The Sensem Corpus: a Corpus Annotated at the Syntactic and Semantic Level

The primary aim of the project SENSEM (Sentence Semantics, BFF2003-06456) is the construction of a Lexical Data Base illustrating the syntactic and semantic behavior of each of the senses of the 250

A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing

This work presents a transition-based system for joint part-of-speech tagging and labeled dependency parsing with non-projective trees that shows consistent improvements in both tagging and parsing accuracy when compared to a pipeline system.