OntoNotes: The 90% Solution

@inproceedings{Hovy2006OntoNotesT9,
  title={OntoNotes: The 90\% Solution},
  author={Eduard H. Hovy and Mitchell P. Marcus and Martha Palmer and Lance A. Ramshaw and Ralph M. Weischedel},
  booktitle={NAACL},
  year={2006}
}
We describe the OntoNotes methodology and its result, a large multilingual richly-annotated corpus constructed at 90% interannotator agreement. An initial portion (300K words of English newswire and 250K words of Chinese newswire) will be made available to the community during 2007. 

Figures from this paper

Constructing an Anaphorically Annotated Corpus with Non-Experts: Assessing the Quality of Collaborative Annotations
TLDR
This paper reports on the ongoing work of Phrase Detectives, an attempt to create a very large anaphorically annotated text corpus and shows that this approach could be used to create large, high-quality natural language resources. Expand
By all these lovely tokens... Merging conflicting tokenizations
TLDR
A solution for integrating different tokenizations using a standoff XML format is described, and the consequences from a corpus-linguistic perspective are discussed. Expand
HENRY-CORE: Domain Adaptation and Stacking for Text Similarity
This paper describes a system for automatically measuring the semantic similarity between two texts, which was the aim of the 2013 Semantic Textual Similarity (STS) task (Agirre et al., 2013). ForExpand
KPWr: Towards a Free Corpus of Polish
TLDR
The corpus is being annotated with various types of linguistic entities: chunks and named entities, selected syntactic and semantic relations, word senses and anaphora. Expand
Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study
TLDR
This paper repurposes WordNet's supersense tags for annotation, developing specific guidelines for nominal expressions and applying them to Arabic Wikipedia articles in four topical domains, resulting in a high coverage corpus. Expand
Word Sense Disambiguation with Multilingual Features
TLDR
The role played by a multilingual feature representation for the task of word sense disambiguation is explored, and it is shown that by using a mult bilingual vector space the authors can obtain error rate reductions of up to 25%, as compared to a monolingual classifier. Expand
Towards the Automatic Creation of a Wordnet from a Term-Based Lexical Network
The work described here aims to create a wordnet automatically from a semantic network based on terms. So, a clustering procedure is ran over a synonymy network, in order to obtain synsets. Then, theExpand
SemEval-2007 Task-17: English Lexical Sample, SRL and All Words
This paper describes our experience in preparing the data and evaluating the results for three subtasks of SemEval-2007 Task-17 - Lexical Sample, Semantic Role Labeling (SRL) and All-WordsExpand
Universal Dependency Annotation for Multilingual Parsing
TLDR
A new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean is presented, made freely available in order to facilitate research on multilingual dependency parsing. Expand
SUC-CORE : A Balanced Corpus Annotated with Noun Phrase Coreference
This paper describes SUC-CORE, a subset of the Stockholm Umea Corpus and the Swedish Treebank annotated with noun phrase coreference. While most coreference annotated corpora consist of texts of siExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 25 REFERENCES
Building a Large Annotated Corpus of English: The Penn Treebank
TLDR
As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus. Expand
Different Sense Granularities for Different Applications
TLDR
An hierarchical approach to WordNet sense distinctions is described that provides different types of automatic Word Sense Disambiguation (WSD) systems, which perform at varying levels of accuracy. Expand
Proposition Bank II: Delving Deeper
TLDR
An overview of the second phase of PropBank Annotation, PropBank II, which is being applied to English and Chinese, and includes (Neodavidsonian) eventuality variables, nominal references, sense tagging, and connections to the Penn Discourse Treebank (PDTB), a project for annotating discourse connectives and their arguments. Expand
The NomBank Project: An Interim Report
TLDR
The NomBank project is described, a project that will provide argument structure for instances of common nouns in the Penn Treebank II corpus, and its specifications and the process involved in creating the resource are described. Expand
Towards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features
This paper shows that our WSD system using rich linguistic features achieved high accuracy in the classification of English SENSEVAL2 verbs for both fine-grained (64.6%) and coarse-grained (73.7%)Expand
The Prague Dependency Treebank : Annotation Structure and Support
The contents of the Prague Dependency Treebank (recently released by the Linguistic Data Consortium in its version 1.0) is described, from morphology to surface syntax to the deep (underlying) syntaxExpand
The Proposition Bank: An Annotated Corpus of Semantic Roles
TLDR
An automatic system for semantic role tagging trained on the corpus is described and the effect on its performance of various types of information is discussed, including a comparison of full syntactic parsing with a flat representation and the contribution of the empty trace categories of the treebank. Expand
Interlingual annotation for MT development
TLDR
This paper describes the creation of an interlingua and the development of a corpus of semantically annotated text, to be validated in six languages and evaluated in several ways. Expand
The Berkeley FrameNet Project
TLDR
This report will present the project's goals and workflow, and information about the computational tools that have been adapted or created in-house for this work. Expand
Fully Parsing the Penn Treebank
We present a two stage parser that recovers Penn Treebank style syntactic analyses of new sentences including skeletal syntactic structure, and, for the first time, both function tags and emptyExpand
...
1
2
3
...