• Corpus ID: 6588104

FlexTag: A Highly Flexible PoS Tagging Framework

  title={FlexTag: A Highly Flexible PoS Tagging Framework},
  author={Torsten Zesch and Tobias Horsmann},
We present FlexTag, a highly flexible PoS tagging framework. In contrast to monolithic implementations that can only be retrained but not adapted otherwise, FlexTag enables users to modify the feature space and the classification algorithm. Thus, FlexTag makes it easy to quickly develop custom-made taggers exactly fitting the research problem. 
Assigning Fine-grained PoS Tags based on High-precision Coarse-grained Tagging
A new approach to PoS tagging where in a first step, a coarse-grained tag is assigned corresponding to the main syntactic category and based on this high-precision decision, specially trained fine- grained models with heavily reduced decision complexity are utilized.
LTL-UDE $@$ EmpiriST 2015: Tokenization and PoS Tagging of Social Media Text
A learning curve experiment shows furthermore that more in-domain training data is very likely to further increase accuracy, and adding unsupervised knowledge beyond the availableTraining data is the most important factor for reaching acceptable tagging accuracy.
Building a Social Media Adapted PoS Tagger Using FlexTag -- A Case Study on Italian Tweets
A model based on FlexTag is trained using only the provided training data and external resources like word clusters and a PoS dictionary which are build from publicly available Italian corpora to find that it is highly effective for Italian.
SoMeWeTa: A Part-of-Speech Tagger for German Social Media and Web Texts
SoMeWeTa is described, a part-of-speech tagger based on the averaged structured perceptron that is capable of domain adaptation and that can use various external resources that substantially improves on the state of the art for both the web and the social media data sets.
Proceedings of the 10th Web as Corpus Workshop, WAC@ACL 2016, Berlin, August 12, 2016
Preliminary results from an ongoing experiment wherein two large unstructured text corpora are classified by topic domain (or subject area) are described, indicating that a revised classification scheme and larger gold standard corpora will likely lead to a substantial increase in accuracy.
EVALITA. Evaluation of NLP and Speech Tools for Italian : Proceedings of the Final Workshop 7 December 2016, Naples
This paper describes the design and reports the results of two questionnaires. The first of these questionnaires was created to collect information about the interest of industrial companies in the


Fine-Grained POS Tagging of German Tweets
This paper presents the first work on POS tagging German Twitter data, showing that despite the noisy and often cryptic nature of the data a fine-grained analysis of POS tags on Twitter microtext is
Fast or Accurate? - A Comparative Evaluation of PoS Tagging Models
The expected trade-off between fast models with relatively low accuracy and slower models with higher accuracy is found and the choice of the model does matter and the model should always be chosen for the task at hand.
Independence and Commitment: Assumptions for Rapid Training and Execution of Rule-based POS Taggers
Adopting two assumptions that serve to exclude rule interactions during tagging and training, some variants of Brill's approach are arrived at that are instances of decision list models, giving tagging accuracy that is comparable to, or better than the Brill method.
Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection
A dynamic model selection approach, coupled with a one-pass, left-to-right POS tagging algorithm, is evaluated on corpora from seven different genres and shows comparable results against other state-of-the-art systems, and gives higher accuracies when evaluated on a mixture of the data.
SVMTool: A general POS Tagger Generator Based on Support Vector Machines
The SVMTool offers a fairly good balance among these properties which make it really practical for current NLP applications, and it is very easy to use and easily configurable so as to perfectly fit the needs of a number of different applications.
Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network
A new part-of-speech tagger is presented that demonstrates the following ideas: explicit use of both preceding and following tag contexts via a dependency network representation, broad use of lexical features, and effective use of priors in conditional loglinear models.
A broad-coverage collection of portable NLP components for building shareable analysis pipelines
Due to the diversity of natural language processing (NLP) tools and resources, combining them into processing pipelines is an important issue, and sharing these pipelines with others remains a
DKPro TC: A Java-based Framework for Supervised Learning Experiments on Textual Data
We present DKPro TC, a framework for supervised learning experiments on textual data. The main goal of DKPro TC is to enable researchers to focus on the actual research task behind the learning
Building a Large Annotated Corpus of English: The Penn Treebank
As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Named Entity Recognition in Tweets: An Experimental Study
The novel T-ner system doubles F1 score compared with the Stanford NER system, and leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision.