Yet Another Format of Universal Dependencies for Korean

  title={Yet Another Format of Universal Dependencies for Korean},
  author={Yige Chen and Eunkyul Leah Jo and Yundong Yao and Kyungtae Lim and Miikka Silfverberg and Francis M. Tyers and Jungyeul Park},
In this study, we propose a morpheme-based scheme for Korean dependency parsing and adopt the proposed scheme to Universal Dependencies. We present the linguistic rationale that illustrates the motivation and the necessity of adopting the morpheme-based format, and develop scripts that convert between the original format used by Universal Dependencies and the proposed morpheme-based format automatically. The effectiveness of the proposed format for Korean dependency parsing is then testified by… 

Figures and Tables from this paper



Towards Fully Lexicalized Dependency Parsing for Korean

A Korean dependency parsing system that can learn the relationships between Korean words from the Treebank corpus and a large raw corpus that outperforms not only the baseline systems but also a state-of-the-art supervised dependency parser is proposed.

Learning from a Neighbor: Adapting a Japanese Parser for Korean Through Feature Transfer Learning

A new dependency parsing method for Korean applying cross-lingual transfer learning and domain adaptation techniques using the Triplet/Quadruplet Model, a hybrid parsing algorithm for Japanese, and applying a delexicalized feature transfer for Korean.

Building Universal Dependency Treebanks in Korean

This paper presents three treebanks in Korean that consist of dependency trees derived from existing treebanks, the Google UD Treebank, the Penn Korean Treebank, and the KAIST Treebank, and

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing

This paper builds a Korean dependency Treebank from an existing constituent Treebank, and shows how to extract useful features for dependency parsing from rich morphology in Korean, using both gold-standard and automatic morphological analysis.

KAIST Tree Bank Project for Korean: Present and Future Development

The ongoing project for building a large annotated corpus of Korean written texts undertaken by KAIST 1 since 1992, which consists of over 5 million word units of Korean covering 13 subject elds, is introduced.

Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection

Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. The annotation

Universal Dependency Annotation for Multilingual Parsing

A new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean is presented, made freely available in order to facilitate research on multilingual dependency parsing.

Universal Dependencies v1: A Multilingual Treebank Collection

This paper describes v1 of the universal guidelines, the underlying design principles, and the currently available treebanks for 33 languages, as well as highlighting the needs for sound comparative evaluation and cross-lingual learning experiments.

A New Annotation Scheme for the Sejong Part-of-speech Tagged Corpus

By using a new annotation scheme, this paper can produce Sejong-style morphological analysis and part-of-speech tagging results which have been the de facto standard for Korean language processing.

Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition

We present two recently released opensource taggers: NameTag is a free software for named entity recognition (NER) which achieves state-of-the-art performance on Czech; MorphoDiTa (Morphological