• Corpus ID: 14033892

Building a Japanese parsed corpus while improving the parsing system

@inproceedings{Kuroashi1998BuildingAJ,
  title={Building a Japanese parsed corpus while improving the parsing system},
  author={Sadao Kuroashi and Makoto Nagao},
  booktitle={LREC},
  year={1998}
}
In January 1996, we started a project to construct a Japanese parsed corpus and to simultaneously improve a morphological analyzer and a parser. In this project, human annotators are not only correcting the erroneous analyses produced by the parsing system, but also improving the parsing system/grammar: nding problematic xed expressions, picking up phrases which have exceptional functions, and classifying unseen types of clauses, and so on. 

Figures from this paper

A Unified Single Scan Algorithm for Japanese Base Phrase Chunking and Dependency Parsing

TLDR
An algorithm for Japanese analysis that does both base phrase chunking and dependency parsing simultaneously in linear-time with a single scan of a sentence with reasonably good accuracy is described.

Using a Partially Annotated Corpus to Build a Dependency Parser for Japanese

TLDR
It is found that a parser trained with a corpus that does not have any grammatical tags for words can demonstrate an accuracy that is comparable to the current state-of-the-art accuracy on the Kyoto University Corpus.

Constructing a Practical Constituent Parser from a Japanese Treebank with Function Labels

TLDR
The evaluations show the parser trained on the treebank has comparable bracketing accuracy as conventional bunsetsu-based parsers, and can output such function labels as the grammatical role of the argument and the type of adnominal phrases.

Integration of a Lexical Type Database with a Linguistically Interpreted Corpus

TLDR
A large scale and detailed database of lexical types in Japanese from a treebank that includes detailed linguistic information helps treebank annotators and grammar developers to share precise knowledge about the grammatical status of words that constitute the treebank.

Evaluation of a Japanese CFG Derived from a Syntactically Annotated Corpus with Respect to Dependency Measures

TLDR
This paper shows the evaluation results of a CFG derived from a large-scale Japanese syntactically annotated corpus and compares it with results of some Japanese dependency analyzers.

Semi-automatic documentation of an implemented linguistic grammar augmented with a treebank

TLDR
The database helps treebank annotators and grammar developers to share precise knowledge about the grammatical status of words that constitute the treebank, allowing for consistent large-scale treebanking and grammar development.

Universal Dependencies Version 2 for Japanese

TLDR
The UD Japanese resources are built based on automatic conversion from several treebanks, and the word delimitation, POS, and syntactic relations of the existing treebanks are ported for the UD annotation scheme.

The Hinoki syntactic and semantic treebank of Japanese

TLDR
The Hinoki treebank is built from dictionary definitions, examples and news text, and uses an HPSG based Japanese grammar to encode both syntactic and semantic information.

The Hinoki Treebank A Treebank for Text Understanding

TLDR
This paper describes the motivation for and construction of a new Japanese lexical resource: the Hinoki treebank, and shows how this treebank can be used to extract thesaurus information from definition sentences in a language-neutral way using minimal recursion semantics.

Robust Segmentation of Japanese Text into a Lattice for Parsing

TLDR
A segmentation component that utilizes minimal syntactic knowledge to produce a lattice of word candidates for a broad coverage Japanese NL parser that achieves a breaking accuracy of ~97% over a wide variety of corpora.
...

References

SHOWING 1-10 OF 11 REFERENCES

Acquiring Disambiguation Rules from Text

TLDR
An effective procedure for automatically acquiring a new set of disambiguation rules for an existing deterministic parser on the basis of tagged text is presented and suggests a path toward more robust and comprehensive syntactic analyzers.

Building a Large Annotated Corpus of English: The Penn Treebank

TLDR
As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.

HPSG-Style Underspecified Japanese Grammar with Wide Coverage

TLDR
A wide-coverage Japanese grammar based on HPSG that can generate parse trees for 87% of the 10000 sentences in the Japanese EDR corpus and the dependency accuracy is 78% when a parser uses the heuristic that every bunsetsu is attached to the nearest possible one.

Inside-Outside Reestimation From Partially Bracketed Corpora

TLDR
The inside-outside algorithm for inferring the parameters of a stochastic context-free grammar is extended to take advantage of constituent information in a partially parsed corpus to achieve faster convergence and better modelling of hierarchical structure than the original one.

A Syntactic Analysis Method of Long Japanese Sentences Based on the Detection of Conjunctive Structures

This paper presents a syntactic analysis method that first detects conjunctive structures in a sentence by checking parallelism of two series of words and then analyzes the dependency structure of

Japanese case structure analysis by unsupervised construction of a case frame dictionary

TLDR
This paper proposes an unsupervised method of constructing a case frame dictionary from an enormous raw corpus by using a robust and accurate parser and provides a case structure analysis method based on the constructed dictionary.

Basic Japanese grammar = ハンディ日本語文法

TLDR
It’s time to dust off your whistle-blowing skills and start using them again.

Beyond Skeleton Parsing: Producing a Comprehensive Large-Scale General-English Treebank With Full Grammatical Analysis

TLDR
The AT'R/Lancaster 7'reebauk of American English is presented, a new resource for natural language processing research, which has been prepared by Lancaster University (UK)'s Unit for Computer Research on the English Language, according to specifications provided by ATR (Japan)'s Statistical Parsing Group.

A grammar of contemporary Japanese

  • A grammar of contemporary Japanese
  • 1993

EDR Electronic Dictionary Specifications Guide

  • EDR Electronic Dictionary Specifications Guide
  • 1993