# Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language

@article{Shmidman2022IntroducingBB,
title={Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language},
author={Avi Shmidman and Joshua Guedalia and Shaltiel Shmidman and Cheyn Shmuel Shmidman and Eli Handel and Moshe Koppel},
journal={ArXiv},
year={2022},
volume={abs/2208.01875}
}
• Published 3 August 2022
• Linguistics, Computer Science
• ArXiv
We present a new pre-trained language model (PLM) for Rabbinic Hebrew, termed Berel (BERT Embeddings for Rabbinic-Encoded Language). Whilst other PLMs exist for processing Hebrew texts (e.g., HeBERT, Aleph-Bert), they are all trained on modern Hebrew texts, which diverges substantially from Rabbinic Hebrew in terms of its lexicographi cal, morphological, syntactic and orthographic norms. We demonstrate the superiority of Berel on Rabbinic texts via a challenge set of Hebrew homographs. We…
1 Citations
• Computer Science
NLP4DH
• 2022
This work proposes a system for classification of rabbinic literature based on its style, leveraging recently released pretrained Transformer models for Hebrew and demonstrates how the method can be applied to uncover lost material from the Midrash Tanhuma.

## References

SHOWING 1-10 OF 10 REFERENCES

• Computer Science
NAACL
• 2019
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
• Computer Science
ArXiv
• 2021
AlephBERT is presented, a large pre-trained language model for Modern Hebrew, which is trained on larger vocabulary and a larger dataset than any Hebrew PLM before, and made publicly available, providing a single point of entry for the development of Hebrew NLP applications.
• Computer Science
ACL
• 2020
BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks.
• Computer Science
EMNLP
• 2019
The design and use of the ONLP suite is described, a joint morpho-syntactic infrastructure for processing Modern Hebrew texts, which provides rich and expressive annotations which already serve diverse academic and commercial needs.
• Computer Science
ArXiv
• 2019
The \textit{Transformers} library is an open-source library that consists of carefully engineered state-of-the art Transformer architectures under a unified API and a curated collection of pretrained models made by and available for the community.
• Computer Science
J. Mach. Learn. Res.
• 2020
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
• Computer Science
INFORMS Journal on Data Science
• 2022
HeBERT and HebEMO are introduced, a transformer-based model for modern Hebrew text which relies on a BERT (bidirectional encoder representations from transformers) architecture and a tool that uses HeBERT to detect polarity and extract emotions from Hebrew UGC.
• Computer Science
ArXiv
• 2019
It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
This chapter begins with a recapitulation of the challenges these phenomena pose on computational applications, and discusses the approaches that were suggested to cope with these challenges in the past.

### Morphological Processing of Semitic Languages, pages 43–66

• Springer Berlin Heidelberg, Berlin, Heidelberg.
• 2014