Learn More
In this work, we revisit Shared Task 1 from the 2012 *SEM Conference: the automated analysis of negation. Unlike the vast majority of participating systems in 2012, our approach works over explicit and formal representations of proposi-tional semantics, i.e. derives the notion of negation scope assumed in this task from the structure of logical-form meaning(More)
many open source natural language processing technologies and advancements. ("This is a sentence/ncut off in the middle because pdf. If you want to get Computer Science Handbook, Second Edition pdf eBook copy write by good Handbook of Natural Language Processing (second edition). Information technology involving natural language, to improve productivity(More)
We investigate the effects of adding semantic annotations including word sense hypernyms to the source text for use as an extra source of information in HPSG parse ranking for the English Resource Grammar. The semantic annotations are coarse semantic categories or entries from a distributional thesaurus, assigned either heuristically or by a pre-trained(More)
We review the state of the art in automated sentence boundary detection (SBD) for English and call for a renewed research interest in this foundational first step in natural language processing. We observe severe limitations in comparability and reproducibility of earlier work and a general lack of knowledge about genre-and domain-specific variations. To(More)
We examine some of the frequently disregarded subtleties of tokenization in Penn Tree-bank style, and present a new rule-based pre-processing toolkit that not only reproduces the Treebank tokenization with unmatched accuracy , but also maintains exact stand-off pointers to the original text and allows flexible configuration to diverse use cases (e.g. to(More)
In this work, we examine and attempt to extend the coverage of a German HPSG grammar. We use the grammar to parse a corpus of newspaper text and evaluate the proportion of sentences which have a correct attested parse, and analyse the cause of errors in terms of lexical or constructional gaps which prevent parsing. Then, using a maximum entropy model, we(More)
We present the WeSearch Data Collection (WDC)—a freely redistributable, partly annotated, comprehensive sample of User-Generated Content. The WDC contains data extracted from a range of genres of varying formality (user forums, product review sites, blogs and Wikipedia) and covers two different domains (NLP and Linux). In this article, we describe the data(More)
This paper describes how external resources can be used to improve parser performance for heavily lexicalised grammars, looking at both robustness and efficiency. In terms of robust-ness, we try using different types of external data to increase lexical coverage, and find that simple POS tags have the most effect, increasing coverage on unseen data by up to(More)