Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language

@article{Shmidman2022IntroducingBB,
  title={Introducing BEREL: BERT Embeddings for Rabbinic-Encoded Language},
  author={Avi Shmidman and Joshua Guedalia and Shaltiel Shmidman and Cheyn Shmuel Shmidman and Eli Handel and Moshe Koppel},
  journal={ArXiv},
  year={2022},
  volume={abs/2208.01875}
}
We present a new pre-trained language model (PLM) for Rabbinic Hebrew, termed Berel (BERT Embeddings for Rabbinic-Encoded Language). Whilst other PLMs exist for processing Hebrew texts (e.g., HeBERT, Aleph-Bert), they are all trained on modern Hebrew texts, which diverges substantially from Rabbinic Hebrew in terms of its lexicographi cal, morphological, syntactic and orthographic norms. We demonstrate the superiority of Berel on Rabbinic texts via a challenge set of Hebrew homographs. We… 

Style Classification of Rabbinic Literature for Detection of Lost Midrash Tanhuma Material

This work proposes a system for classification of rabbinic literature based on its style, leveraging recently released pretrained Transformer models for Hebrew and demonstrates how the method can be applied to uncover lost material from the Midrash Tanhuma.

References

SHOWING 1-10 OF 10 REFERENCES

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

AlephBERT: A Hebrew Large Pre-Trained Language Model to Start-off your Hebrew NLP Application With

AlephBERT is presented, a large pre-trained language model for Modern Hebrew, which is trained on larger vocabulary and a larger dataset than any Hebrew PLM before, and made publicly available, providing a single point of entry for the development of Hebrew NLP applications.

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks.

What’s Wrong with Hebrew NLP? And How to Make it Right

The design and use of the ONLP suite is described, a joint morpho-syntactic infrastructure for processing Modern Hebrew texts, which provides rich and expressive annotations which already serve diverse academic and commercial needs.

HuggingFace's Transformers: State-of-the-art Natural Language Processing

The \textit{Transformers} library is an open-source library that consists of carefully engineered state-of-the art Transformer architectures under a unified API and a curated collection of pretrained models made by and available for the community.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

HeBERT & HebEMO: a Hebrew BERT Model and a Tool for Polarity Analysis and Emotion Recognition

HeBERT and HebEMO are introduced, a transformer-based model for modern Hebrew text which relies on a BERT (bidirectional encoder representations from transformers) architecture and a tool that uses HeBERT to detect polarity and extract emotions from Hebrew UGC.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

Morphological Processing of Semitic Languages

This chapter begins with a recapitulation of the challenges these phenomena pose on computational applications, and discusses the approaches that were suggested to cope with these challenges in the past.

Morphological Processing of Semitic Languages, pages 43–66

  • Springer Berlin Heidelberg, Berlin, Heidelberg.
  • 2014