Nakdan: Professional Hebrew Diacritizer
- Avi Shmidman, Shaltiel Shmidman, Moshe Koppel, Yoav Goldberg
- Computer ScienceAnnual Meeting of the Association for…
- 7 May 2020
The system combines modern neural models with carefully curated declarative linguistic knowledge and comprehensive manually constructed tables and dictionaries to provide state of the art diacritization accuracy and has several features which make it particularly useful for preparation of scientific editions of historical Hebrew texts.
Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language
- Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Cheyn Shmuel Shmidman, Eli Handel, Moshe Koppel
- Linguistics, Computer ScienceArXiv
- 3 August 2022
Berel (BERT Embeddings for Rabbinic-Encoded Language) is presented and it is demonstrated the superiority of Berel on Rabbinic texts via a challenge set of Hebrew homographs.
Large Pre-Trained Models with Extra-Large Vocabularies: A Contrastive Analysis of Hebrew BERT Models and a New One to Outperform Them All
- Eylon Guetta, Avi Shmidman, Reut Tsarfaty
- Computer ScienceArXiv
- 28 November 2022
A new pre-trained language model for modern Hebrew, termed AlephBERTGimmel, is presented, which employs a much larger vocabulary than standard Hebrew PLMs before and achieves new SOTA on all available Hebrew benchmarks, including Morphological Segmentation, POS Tagging, Full Morphological Analysis, NER, and Sentiment Analysis.
A Novel Challenge Set for Hebrew Morphological Disambiguation and Diacritics Restoration
- Avi Shmidman, Joshua Guedalia, Shaltiel Shmidman, Moshe Koppel, Reut Tsarfaty
- LinguisticsFindings
- 6 October 2020
It is shown that the current SOTA of Hebrew disambiguation performs poorly on cases of unbalanced ambiguity, and an offer of a challenge set for Hebrew homographs — the first of its kind — containing substantial attestation of each analysis of 21 Hebrew Homographs is offered.