Learning Mutual Fund Categorization using Natural Language Processing
@article{Vamvourellis2022LearningMF, title={Learning Mutual Fund Categorization using Natural Language Processing}, author={Dimitrios Vamvourellis and M{\'a}t{\'e} Attila T{\'o}th and Dhruv Desai and Dhagash Mehta and Stefano Pasquali}, journal={Proceedings of the Third ACM International Conference on AI in Finance}, year={2022} }
Categorization of mutual funds or Exchange-Traded-funds (ETFs) have long served the financial analysts to perform peer analysis for various purposes starting from competitor analysis, to quantifying portfolio diversification. The categorization methodology usually relies on fund composition data in the structured format extracted from the Form N-1A. Here, we initiate a study to learn the categorization system directly from the unstructured data as depicted in the forms using natural language…
References
SHOWING 1-10 OF 34 REFERENCES
Machine learning fund categorizations
- Computer ScienceICAIF
- 2020
It is established that an industry wide well-regarded categorization system is learnable using machine learning and largely reproducible, and in turn constructing a truly data-driven categorization is constructed.
Fund2Vec: mutual funds similarity using graph learning
- Computer ScienceICAIF
- 2021
This work proposes a radically new approach to identify similar funds based on the weighted bipartite network representation of funds and their underlying assets data using a sophisticated machine learning method called Node2Vec which learns an embedded low-dimensional representation of the network.
Related Stocks Selection with Data Collaboration Using Text Mining
- Computer ScienceInf.
- 2019
An extended scheme for selecting related stocks for themed mutual funds based on words extracted according to their similarity to a theme using word2vec and the authors' unique similarity based on co-occurrence in company information is proposed.
FinBERT: A Pretrained Language Model for Financial Communications
- Computer ScienceArXiv
- 2020
This work addresses the need by pretraining a financial domain specific BERT models, FinberT, using a large scale of financial communication corpora, and confirms the advantage of FinBERT over generic domain BERT model.
On Robustness of Mutual Funds Categorization and Distance Metric Learning
- Computer ScienceThe Journal of Financial Data Science
- 2021
The authors settle the debate in favor of Morningstar categorization by pointing out the use of incorrect lists of variables and interpretation of machine learning algorithms in the previous literature, while emphasizing that the main missing piece from the machine learning side in previous research was the appropriate distance metric.
Classifying Companies by Industry Using Word Embeddings
- Computer ScienceNLDB
- 2018
The presented approach showed some promise, but also some limitations and may in its current form be only robust enough for semi-automated classification.
Universal Language Model Fine-tuning for Text Classification
- Computer ScienceACL
- 2018
This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model.
Double clustering for rating mutual funds
- Computer Science
- 2015
An application of clustering methods to the mutual funds historical data is shown, producing a partition of funds that are readily inter-pretable from a financial point of view and it is further possible to rank the identified groups, thus obtaining a rating of Funds that turns out to account for different propensities toward the risk exposure.
Self-organizing maps could improve the classification of Spanish mutual funds
- Computer ScienceEur. J. Oper. Res.
- 2006
Soft Clustering for Funds Management Style Analysis: Out-of-Sample Predictability
- BusinessSOCO 2008
- 2008
This comparison demonstrates soft clustering can predict mutual fund performance better out-of-sample than a hard clustering technique (GSC) and it is demonstrated that investment style boundaries are continuous rather than "hard".