Part-of-Speech Annotation of Biology Research Abstracts

  title={Part-of-Speech Annotation of Biology Research Abstracts},
  author={Yuka Tateisi and Jun'ichi Tsujii},
A part-of-speech (POS) tagged corpus was built on research abstracts in biomedical domain with the Penn Treebank scheme. As consistent annotation was difficult without domain-specific knowledge we made use of the existing term annotation of the GENIA corpus. A list of frequent terms annotated in the GENIA corpus was compiled and the POS of each constituent of those terms were determined with assistance from domain specialists. The POS of the terms in the list are pre-assigned, then a tagger… CONTINUE READING

From This Paper

Figures, tables, and topics from this paper.


Publications citing this paper.
Showing 1-10 of 31 extracted citations


Publications referenced by this paper.
Showing 1-7 of 7 references

A post-editor’s guide to CLAWS7 tagging

  • M. Wynne
  • 1996
Highly Influential
3 Excerpts

UMLS knowledge resources documentation

  • F. Olsson, G. Eriksson, K. Franzen, L. Asker
  • 2003

Similar Papers

Loading similar papers…