Topic Modelling on Consumer Financial Protection Bureau Data: An Approach Using BERT Based Embeddings

@article{Sangaraju2022TopicMO,
  title={Topic Modelling on Consumer Financial Protection Bureau Data: An Approach Using BERT Based Embeddings},
  author={Vasudeva Raju Sangaraju and Bharath Kumar Bolla and Deepa Nayak and Jyothsna Kh},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.07259}
}
—Customers' reviews and comments are important for businesses to understand users' sentiment about the products and services. However, this data needs to be analyzed to assess the sentiment associated with topics/aspects to provide efficient customer assistance. LDA and LSA fail to capture the semantic relationship and are not specific to any domain. In this study, we evaluate BERTopic, a novel method that generates topics using sentence embeddings on Consumer Financial Protection Bureau (CFPB… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 24 REFERENCES
Topic Modeling in Embedding Spaces
TLDR
The embedded topic model (etm) is developed, a generative model of documents that marries traditional topic models with word embeddings and outperforms existing document models, such as latent Dirichlet allocation, in terms of both topic quality and predictive performance.
Empirical study of topic modeling in Twitter
TLDR
It is shown that by training a topic model on aggregated messages the authors can obtain a higher quality of learned model which results in significantly better performance in two real-world classification problems.
A Detailed Survey on Topic Modeling for Document and Short Text Data
TLDR
A detailed survey covering the various topic modeling techniques proposed in last decade is presented, which focuses on different strategies of extracting the topics in social media text, where the goal is to find and aggregate the topic within short texts.
A Survey on Journey of Topic Modeling Techniques from SVD to Deep Learning
TLDR
A survey on journey of topic modeling techniques comprising Latent Dirichlet Allocation (LDA) and non-LDA based techniques and the reason for classify the techniques into LDA and non -LDA is that LDA has ruled the topic modeled techniques since its inception.
Hierarchical Topic Models and the Nested Chinese Restaurant Process
TLDR
A Bayesian approach is taken to generate an appropriate prior via a distribution on partitions that allows arbitrarily large branching factors and readily accommodates growing data collections.
What is wrong with topic modeling? And how to fix it using search-based software engineering
Latent Dirichlet Allocation
Knowledge discovery through directed probabilistic topic models: a survey
TLDR
This paper surveys an important subclass Directed Probabilistic Topic Models (DPTMs) with soft clustering abilities and their applications for knowledge discovery in text corpora, giving basic concepts, advantages and disadvantages in a chronological order.
Integrating Document Clustering and Topic Modeling
TLDR
A multi-grain clustering topic model (MGCTM) which integrates document clustering and topic modeling into a unified framework and jointly performs the two tasks to achieve the overall best performance is proposed.
The Author-Topic Model for Authors and Documents
TLDR
The author-topic model is introduced, a generative model for documents that extends Latent Dirichlet Allocation to include authorship information, and applications to computing similarity between authors and entropy of author output are demonstrated.
...
...