A Novel Challenge Set for Hebrew Morphological Disambiguation and Diacritics Restoration
@inproceedings{Shmidman2020ANC, title={A Novel Challenge Set for Hebrew Morphological Disambiguation and Diacritics Restoration}, author={Avi Shmidman and Joshua Guedalia and Shaltiel Shmidman and Moshe Koppel and Reut Tsarfaty}, booktitle={Findings}, year={2020} }
One of the primary tasks of morphological parsers is the disambiguation of homographs. Particularly difficult are cases of unbalanced ambiguity, where one of the possible analyses is far more frequent than the others. In such cases, there may not exist sufficient examples of the minority analyses in order to properly evaluate performance, nor to train effective classifiers. In this paper we address the issue of unbalanced morphological ambiguities in Hebrew. We offer a challenge set for Hebrew…
One Citation
What do we really know about State of the Art NER?
- Computer ScienceLREC
- 2022
A broad evaluation of NER is performed using a popular dataset, that takes into consideration various text genres and sources constituting the dataset at hand, and recommends some useful reporting practices for NER researchers that could help in providing a better understanding of a SOTA model’s performance in future.
References
SHOWING 1-10 OF 33 REFERENCES
What’s Wrong with Hebrew NLP? And How to Make it Right
- Computer ScienceEMNLP
- 2019
The design and use of the ONLP suite is described, a joint morpho-syntactic infrastructure for processing Modern Hebrew texts, which provides rich and expressive annotations which already serve diverse academic and commercial needs.
Disambiguation by short contexts
- Computer ScienceComput. Humanit.
- 1985
This paper describes a technique that is of great help in many text-processing situations, and reports on an experiment recently conducted to test its validity and scope, namely that of disambiguation by short contexts.
A Challenge Set and Methods for Noun-Verb Ambiguity
- LinguisticsEMNLP
- 2018
A new dataset of over 30,000 naturally-occurring non-trivial examples of noun-verb ambiguity is created, with a 28% reduction in error over the prior best learned model for homograph disambiguation for textto-speech synthesis.
Noun Homograph Disambiguation Using Local Context in Large Text Corpora
- Computer Science
- 1991
An accurate, relatively inexpensive method for the disambiguation of noun homographs using large text corpora using both machine readable dictionaries and unrestricted text and the use of training instances is determined to be a crucial di erence.
MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization
- Computer Science
- 2009
We describe the MADA+TOKAN toolkit, a versatile and freely available system that can derive extensive morphological and contextual information from raw Arabic text, and then use this information for…
Don’t Forget the Long Tail! A Comprehensive Analysis of Morphological Generalization in Bilingual Lexicon Induction
- LinguisticsEMNLP
- 2019
It is demonstrated that the performance of state-of-the-art models drops considerably when evaluated on infrequent morphological inflections and then it is shown that adding a simple morphological constraint at training time improves the performance, proving that the bilingual lexicon inducers can benefit from better encoding of morphology.
Supertagging: An Approach to Almost Parsing
- Linguistics, Computer ScienceCL
- 1999
Novel methods for robust parsing that integrate the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques are proposed.
A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge
- Computer ScienceNAACL
- 2009
This paper presents a fully unsupervised word sense disambiguation method that requires only a dictionary and unannotated text as input and overcomes the problem of brittleness suffered in many existing methods and makes broad-coverage wordsense disambIGuation feasible in practice.
Handling Homographs in Neural Machine Translation
- Computer ScienceNAACL
- 2018
Empirical evidence is provided that existing NMT systems in fact still have significant problems in properly translating ambiguous words, and methods are described that model the context of the input word with context-aware word embeddings that help to differentiate the word sense before feeding it into the encoder.
Getting the ##life out of living: How Adequate Are Word-Pieces for Modelling Complex Morphology?
- LinguisticsSIGMORPHON
- 2020
The results show that, while models trained to predict multi-tags for complete words outperform models tuned to predict the distinct tags of WPs, the WPs tag prediction can be improved by purposefully constraining the word-pieces to reflect their internal functions.