Learn More
In this paper we present GATE, a framework and graphical development environment which enables users to develop and deploy language engineering components and resources in a robust fashion. The GATE architecture has enabled us not only to develop a number of successful applications for various language processing tasks (such as Information Extraction), but(More)
In this paper we present recent work on GATE, a widely-used framework and graphical development environment for creating and deploying Language Engineering components and resources in a robust fashion. The GATE architecture has facilitated the development of a number of successful applications for various language processing tasks (such as Information(More)
Twitter is the largest source of microblog text, responsible for gigabytes of human discourse every day. Processing microblog text is difficult: the genre is noisy, documents have little context, and utterances are very short. As such, conventional NLP tools fail when faced with tweets and other microblog text. We present TwitIE, an open-source NLP pipeline(More)
Accessing structured data such as that encoded in ontologies and knowledge bases can be done using either syntactically complex formal query languages like SPARQL or complicated form interfaces that require expensive customisation to each particular application domain. This paper presents the QuestIO system – a natural language interface for accessing(More)
Part-of-speech information is a prerequisite in many NLP algorithms. However, Twitter text is difficult to part-of-speech tag: it is noisy, with linguistic errors and idiosyncratic style. We present a detailed error analysis of existing taggers, motivating a series of tagger augmentations which are demonstrated to improve performance. We identify and(More)
Nous nous intéressons dans cet article aux méthodes superficielles de résolution d'anaphores et de construction des chaˆınes de référence, que nous avons développées comme modules du système d'extraction d'information ANNIE. La module " orthomatcher " traite la coréférence orthographique des noms propres et le module de résolution d'anaphores traite les(More)
Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets(More)