• Corpus ID: 235400037

NFinBERT: A Number-Aware Language Model for Financial Disclosures (short paper)

  title={NFinBERT: A Number-Aware Language Model for Financial Disclosures (short paper)},
  author={Hao-Lun Lin and Jr-Shian Wu and Yu-Shiang Huang and Ming-Feng Tsai and Chuan-Ju Wang},
As numerals comprise rich semantic information in financial texts, they play crucial roles in financial data analysis and financial decision making. We propose NFinBERT, a number-aware contextualized language model trained on financial disclosures. Although BERT and other contextualized language models work well for many NLP tasks, they are not specialized in finance and thus do not properly manage numerical information in financial texts. Therefore, we propose pre-training the language model… 

Figures and Tables from this paper


FinBERT: Financial Sentiment Analysis with Pre-trained Language Models
FinBERT, a language model based on BERT, is introduced to tackle NLP tasks in the financial domain and it is found that even with a smaller training set and fine-tuning only a part of the model, FinBERT outperforms state-of-the-art machine learning methods.
Financial Numeral Classification Model Based on BERT
A model based on the Bidirectional Encoder Representations from Transformers (BERT) to identify the category and subcategory of a numeral in financial documents is proposed and achieves good performance in the FinNum task at NTCIR-14.
Discovering Finance Keywords via Continuous-Space Language Models
The continuous bag-of-words (CBOW) model is applied to the textual information in 10-K financial reports to discover new finance keywords and is effective for discovering predictability keywords for post-event volatility, stock volatility, abnormal trading volume, and excess return predictions.
Universal Language Model Fine-tuning for Text Classification
This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
The contextual representations learned by the proposed replaced token detection pre-training task substantially outperform the ones learned by methods such as BERT and XLNet given the same model size, data, and compute.
Predicting Risk from Financial Reports with Regression
This work applies well-known regression techniques to a large corpus of freely available financial reports, constructing regression models of volatility for the period following a report, rivaling past volatility in predicting the target variable.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
This article introduces BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora that largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre- trained on biomedical Corpora.
When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks
Previous research uses negative word counts to measure the tone of a text. We show that word lists developed for other disciplines misclassify common words in financial text. In a large sample of 10
Self-Attentive Sentimental Sentence Embedding for Sentiment Analysis
The use of a word-level sentiment bidirectional LSTM in tandem with the self-attention mechanism for sentence- level sentiment prediction is proposed and a finance report dataset for sentences-level financial risk detection is presented.