Learning Mutual Fund Categorization using Natural Language Processing

  title={Learning Mutual Fund Categorization using Natural Language Processing},
  author={Dimitrios Vamvourellis and M{\'a}t{\'e} Attila T{\'o}th and Dhruv Desai and Dhagash Mehta and Stefano Pasquali},
  journal={Proceedings of the Third ACM International Conference on AI in Finance},
Categorization of mutual funds or Exchange-Traded-funds (ETFs) have long served the financial analysts to perform peer analysis for various purposes starting from competitor analysis, to quantifying portfolio diversification. The categorization methodology usually relies on fund composition data in the structured format extracted from the Form N-1A. Here, we initiate a study to learn the categorization system directly from the unstructured data as depicted in the forms using natural language… 

Figures and Tables from this paper



Machine learning fund categorizations

It is established that an industry wide well-regarded categorization system is learnable using machine learning and largely reproducible, and in turn constructing a truly data-driven categorization is constructed.

Fund2Vec: mutual funds similarity using graph learning

This work proposes a radically new approach to identify similar funds based on the weighted bipartite network representation of funds and their underlying assets data using a sophisticated machine learning method called Node2Vec which learns an embedded low-dimensional representation of the network.

Related Stocks Selection with Data Collaboration Using Text Mining

An extended scheme for selecting related stocks for themed mutual funds based on words extracted according to their similarity to a theme using word2vec and the authors' unique similarity based on co-occurrence in company information is proposed.

FinBERT: A Pretrained Language Model for Financial Communications

This work addresses the need by pretraining a financial domain specific BERT models, FinberT, using a large scale of financial communication corpora, and confirms the advantage of FinBERT over generic domain BERT model.

On Robustness of Mutual Funds Categorization and Distance Metric Learning

The authors settle the debate in favor of Morningstar categorization by pointing out the use of incorrect lists of variables and interpretation of machine learning algorithms in the previous literature, while emphasizing that the main missing piece from the machine learning side in previous research was the appropriate distance metric.

Classifying Companies by Industry Using Word Embeddings

The presented approach showed some promise, but also some limitations and may in its current form be only robust enough for semi-automated classification.

Universal Language Model Fine-tuning for Text Classification

This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model.

Double clustering for rating mutual funds

An application of clustering methods to the mutual funds historical data is shown, producing a partition of funds that are readily inter-pretable from a financial point of view and it is further possible to rank the identified groups, thus obtaining a rating of Funds that turns out to account for different propensities toward the risk exposure.

Self-organizing maps could improve the classification of Spanish mutual funds

Soft Clustering for Funds Management Style Analysis: Out-of-Sample Predictability

This comparison demonstrates soft clustering can predict mutual fund performance better out-of-sample than a hard clustering technique (GSC) and it is demonstrated that investment style boundaries are continuous rather than "hard".