Learn More
This paper shows that training a lexicalized parser on a lemmatized morphologically-rich treebank such as the French Treebank slightly improves parsing results. We also show that lemmatizing a similar in size subset of the En-glish Penn Treebank has almost no effect on parsing performance with gold lemmas and leads to a small drop of performance when(More)
This paper describes the DCU-UVT team's participation in the Language Identification in Code-Switched Data shared task in the Workshop on Computational Approaches to Code Switching. Word-level classification experiments were carried out using a simple dictionary-based method, linear kernel support vector machines (SVMs) with and without con-textual clues,(More)
The distinction between raising and subject-control verbs, although crucial for the construction of semantics, is not easy to make given access to only the local syntactic configuration of the sentence. In most contexts raising verbs and control verbs display identical superficial syntactic structure. Linguists apply grammaticality tests to distinguish(More)
In Question Answering a major challenge is the fact that similar meaning is very often expressed with different surface realizations in questions and in sentences containing the answer. In this paper we propose an enriched syntax-based representation which helps deal with this widespread variability and provides a degree of generalization. We encode(More)
Data-driven grammatical function tag assignment has been studied for English using the Penn-II Treebank data. In this paper we address the question of whether such methods can be applied successfully to other languages and treebank resources. In addition to tag assignment accuracy and f-scores we also present results of a task-based evaluation. We use three(More)
Lemmatization for languages with rich inflectional morphology is one of the basic, indispensable steps in a language processing pipeline. In this paper we present a simple data-driven context-sensitive approach to lemmatizating word forms in running text. We treat lemmatization as a classification task for Machine Learning, and automatically induce class(More)
  • 1