Corpus ID: 6588104

FlexTag: A Highly Flexible PoS Tagging Framework

@inproceedings{Zesch2016FlexTagAH,
  title={FlexTag: A Highly Flexible PoS Tagging Framework},
  author={Torsten Zesch and Tobias Horsmann},
  booktitle={LREC},
  year={2016}
}
We present FlexTag, a highly flexible PoS tagging framework. In contrast to monolithic implementations that can only be retrained but not adapted otherwise, FlexTag enables users to modify the feature space and the classification algorithm. Thus, FlexTag makes it easy to quickly develop custom-made taggers exactly fitting the research problem. 
Assigning Fine-grained PoS Tags based on High-precision Coarse-grained Tagging
TLDR
A new approach to PoS tagging where in a first step, a coarse-grained tag is assigned corresponding to the main syntactic category and based on this high-precision decision, specially trained fine- grained models with heavily reduced decision complexity are utilized. Expand
LTL-UDE $@$ EmpiriST 2015: Tokenization and PoS Tagging of Social Media Text
TLDR
A learning curve experiment shows furthermore that more in-domain training data is very likely to further increase accuracy, and adding unsupervised knowledge beyond the availableTraining data is the most important factor for reaching acceptable tagging accuracy. Expand
Building a Social Media Adapted PoS Tagger Using FlexTag -- A Case Study on Italian Tweets
TLDR
A model based on FlexTag is trained using only the provided training data and external resources like word clusters and a PoS dictionary which are build from publicly available Italian corpora to find that it is highly effective for Italian. Expand
SoMeWeTa: A Part-of-Speech Tagger for German Social Media and Web Texts
TLDR
SoMeWeTa is described, a part-of-speech tagger based on the averaged structured perceptron that is capable of domain adaptation and that can use various external resources that substantially improves on the state of the art for both the web and the social media data sets. Expand
Proceedings of the 10th Web as Corpus Workshop, WAC@ACL 2016, Berlin, August 12, 2016
TLDR
Preliminary results from an ongoing experiment wherein two large unstructured text corpora are classified by topic domain (or subject area) are described, indicating that a revised classification scheme and larger gold standard corpora will likely lead to a substantial increase in accuracy. Expand
EVALITA. Evaluation of NLP and Speech Tools for Italian : Proceedings of the Final Workshop 7 December 2016, Naples
This paper describes the design and reports the results of two questionnaires. The first of these questionnaires was created to collect information about the interest of industrial companies in theExpand

References

SHOWING 1-10 OF 23 REFERENCES
Fine-Grained POS Tagging of German Tweets
This paper presents the first work on POS tagging German Twitter data, showing that despite the noisy and often cryptic nature of the data a fine-grained analysis of POS tags on Twitter microtext isExpand
Fast or Accurate? - A Comparative Evaluation of PoS Tagging Models
TLDR
The expected trade-off between fast models with relatively low accuracy and slower models with higher accuracy is found and the choice of the model does matter and the model should always be chosen for the task at hand. Expand
Independence and Commitment: Assumptions for Rapid Training and Execution of Rule-based POS Taggers
TLDR
Adopting two assumptions that serve to exclude rule interactions during tagging and training, some variants of Brill's approach are arrived at that are instances of decision list models, giving tagging accuracy that is comparable to, or better than the Brill method. Expand
Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection
TLDR
A dynamic model selection approach, coupled with a one-pass, left-to-right POS tagging algorithm, is evaluated on corpora from seven different genres and shows comparable results against other state-of-the-art systems, and gives higher accuracies when evaluated on a mixture of the data. Expand
SVMTool: A general POS Tagger Generator Based on Support Vector Machines
TLDR
The SVMTool offers a fairly good balance among these properties which make it really practical for current NLP applications, and it is very easy to use and easily configurable so as to perfectly fit the needs of a number of different applications. Expand
Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network
TLDR
A new part-of-speech tagger is presented that demonstrates the following ideas: explicit use of both preceding and following tag contexts via a dependency network representation, broad use of lexical features, and effective use of priors in conditional loglinear models. Expand
A broad-coverage collection of portable NLP components for building shareable analysis pipelines
Due to the diversity of natural language processing (NLP) tools and resources, combining them into processing pipelines is an important issue, and sharing these pipelines with others remains aExpand
DKPro TC: A Java-based Framework for Supervised Learning Experiments on Textual Data
We present DKPro TC, a framework for supervised learning experiments on textual data. The main goal of DKPro TC is to enable researchers to focus on the actual research task behind the learningExpand
Building a Large Annotated Corpus of English: The Penn Treebank
TLDR
As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus. Expand
Named Entity Recognition in Tweets: An Experimental Study
TLDR
The novel T-ner system doubles F1 score compared with the Stanford NER system, and leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. Expand
...
1
2
3
...