• Publications
  • Influence
Universal Dependencies 2.1
TLDR
The annotation scheme is based on (universal) Stanford dependencies, Google universal part-of-speech tags, and the Interset interlingua for morphosyntactic tagsets for morpho-lingual tagsets. Expand
HamleDT: To Parse or Not to Parse?
TLDR
The proposed HamleDT ― HArmonized Multi-LanguagE Dependency Treebank is a compilation of existing dependency treebanks, transformed so that they all conform to the same annotation style. Expand
Morphological Processing for English-Tamil Statistical Machine Translation
TLDR
This work implements suffix-separation rules for both of the English-Tamil language pair, and evaluates the impact of this preprocessing on translation quality of the phrase-based as well as hierarchical model in terms of BLEU score and a small manual evaluation. Expand
Universal Dependencies 1.4
TLDR
The annotation scheme is based on (universal) Stanford dependencies, Google universal part-of-speech tags, and the Interset interlingua for morphosyntactic tagsets for morpho-lingual tagsets. Expand
HamleDT: Harmonized multi-language dependency treebank
TLDR
It is claimed that transformation procedures can be designed to automatically identify most such phenomena and convert them to a unified annotation style, which is beneficial both to comparative corpus linguistics and to machine learning of syntactic parsing. Expand
Prague Dependency Style Treebank for Tamil
TLDR
The efforts in syntactically annotating a small corpora (600 sentences) of Tamil language are described, similar to Prague Dependency Treebank (PDT) and consists of annotation at 2 levels or layers: morphological layer (m-layer) and analytical layer (a-layer). Expand
Using an SVM Ensemble System for Improved Tamil Dependency Parsing
TLDR
A new approach for addressing morphologically rich languages with little training data to start by training an SVM classifier using only the model agreements as features to form an ensemble parse tree. Expand
Multilingual Dependency Parsing: Using Machine Translated Texts instead of Parallel Corpora
TLDR
This paper revisits the projection-based approach to dependency grammar induction task by obtaining the the source side of the text from a machine translation (MT) system and then applying transfer approaches to induce parser for the target languages. Expand
EnTam: An English-Tamil Parallel Corpus (EnTam v2.0)
TLDR
EnTam is a sentence aligned English-Tamil bilingual corpus from some of the publicly available websites collected for NLP research involving Tamil suitable for various NLP tasks. Expand
Tamil Dependency Parsing: Results Using Rule Based and Corpus Based Approaches
TLDR
This paper designed annotation scheme partially based on Prague Dependency Treebank and manually annotated Tamil data with dependency relations and used two well known parsers MaltParser and MSTParser to build dependency structure for Tamil sentences. Expand
...
1
2
3
...