Document Parsing : Towards Realistic Syntactic Analysis

  title={Document Parsing : Towards Realistic Syntactic Analysis},
  author={Rebecca Dridan and Stephan Oepen},
In this work we take a view of syntactic analysis as processing ‘raw’, running text instead of idealised, pre-segmented inputs—a task we dub document parsing. We observe the state of the art in sentence boundary detection and tokenisation, and their effects on syntactic parsing (for English), observing that common evaluation metrics are ill-suited for the… CONTINUE READING