Corpus ID: 32288359

Topic supervised non-negative matrix factorization

  title={Topic supervised non-negative matrix factorization},
  author={Kelsey MacMillan and James D. Wilson},
Topic models have been extensively used to organize and interpret the contents of large, unstructured corpora of text documents. Although topic models often perform well on traditional training vs. test set evaluations, it is often the case that the results of a topic model do not align with human interpretation. This interpretability fallacy is largely due to the unsupervised nature of topic models, which prohibits any user guidance on the results of a model. In this paper, we introduce a semi… Expand
A Detailed Survey on Topic Modeling for Document and Short Text Data
A detailed survey covering the various topic modeling techniques proposed in last decade is presented, which focuses on different strategies of extracting the topics in social media text, where the goal is to find and aggregate the topic within short texts. Expand
Using Topic Modeling via Non-negative Matrix Factorization to Identify Relationships between Genetic Variants and Disease Phenotypes: A Case Study of Lipoprotein(a) (LPA)
The results demonstrate the applicability of topic modeling in exploring the relationship between the genome and clinical diseases and demonstrated the feasibility of using topic modeling to replicate and discover novel associations between the human genome andclinical diseases. Expand
Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: A case study of Lipoprotein(a) (LPA)
This study used topic modeling via non-negative matrix factorization (NMF) for identifying associations between disease phenotypes and genetic variants through phenome-wide scanning to demonstrate the applicability of topic modeling in exploring the relationship between genetic variants and clinical diseases. Expand
ScanMap: Supervised Confounding Aware Non-negative Matrix Factorization for Polygenic Risk Modeling
  • Yuan Luo
  • Mathematics, Computer Science
  • MLHC
  • 2020
This work evaluates ScanMap against multiple state-of-the-art unsupervised and supervised matrix factorization models using large scale NGS datasets and highlights the insights and benefits of gene groups learned by ScanMap in disease risk prediction. Expand
Inferring modes of transportation using mobile phone data
This paper proposes an algorithmic pipeline to infer the distribution of mode of transportation usage in a city, using mobile phone network data, based on a Topic-Supervised Non-Negative Matrix Factorization model, using a Weak-Labeling strategy on user trajectories with data obtained from open datasets, such as GTFS and OpenStreetMap. Expand
Tweets on the Go: Gender Differences in Transport Perception and Its Discussion on Social Media
People often base their mobility decisions on subjective aspects of travel experience, such as time perception, space usage, and safety. It is well recognized that different groups within aExpand
Characterization of Local Attitudes Toward Immigration Using Social Media
It is found that the discussion is mostly driven by Haitian immigration; that there are temporal trends in tendency and polarity of discussion; and that assortative behavior on the network differs with respect to attitude. Expand
Characterizing Transport Perception using Social Media: Differences in Mode and Gender
This work analyzed 300K tweets about transportation in Santiago, Chile, and estimated the associations between mode of transportation, gender, and the categories of a psycho-linguistic lexicon to provide evidence on which aspects of transportation are relevant in the daily experience. Expand
What Happens Where During Disasters? A Workflow for the Multifaceted Characterization of Crisis Events Based on Twitter Data
This study focuses on the design and evaluation of a generic workflow for Twitter data analysis that leverages that additional information to characterize crisis events more comprehensively and experimental results obtained with a data set acquired during hurricane Florence demonstrate the effectiveness of the applied methods. Expand
Adoption-Driven Data Science for Transportation Planning: Methodology, Case Study, and Lessons Learned
This paper proposes a methodology toward bridging two disciplines, data science and transportation, to identify, understand, and solve transportation planning problems with data-driven solutions that are suitable for adoption by urban planners and policy makers. Expand


Non-negative matrix factorization for semi-supervised data clustering
This paper proposes SS-NMF: a semi-supervised non-negative matrix factorization framework for data clustering, and demonstrates the superior performance of SS- NMF for clustering through extensive experiments conducted on publicly available datasets. Expand
Non-Negative Matrix Factorization with Constraints
This paper proposes a novel semi-supervised matrix decomposition method, called Constrained Non-negative Matrix Factorization, which takes the label information as additional constraints and requires that the data points sharing the same label have the same coordinate in the new representation space. Expand
Reading Tea Leaves: How Humans Interpret Topic Models
New quantitative methods for measuring semantic meaning in inferred topics are presented, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood. Expand
Document clustering based on non-negative matrix factorization
This paper proposes a novel document clustering method based on the non-negative factorization of the term-document matrix of the given document corpus that surpasses the latent semantic indexing and the spectral clustering methods not only in the easy and reliable derivation of document clustered results, but also in document clusters accuracies. Expand
Initializations for the Nonnegative Matrix Factorization
The need to process and conceptualize large sparse matrices effectively and efficiently (typically via low-rank approximations) is essential for many data mining applications, including document andExpand
Latent Dirichlet Allocation with Topic-in-Set Knowledge
This work proposes a mechanism for adding partial supervision, called topic-in-set knowledge, to latent topic modeling, to encourage the recovery of topics which are more relevant to user modeling goals than the topics which would be recovered otherwise. Expand
Probabilistic Topic Models
In this article, we review probabilistic topic models: graphical models that can be used to summarize a large collection of documents with a smaller number of distributions over words. ThoseExpand
Latent Dirichlet Allocation
We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], andExpand
Learning the parts of objects by non-negative matrix factorization
An algorithm for non-negative matrix factorization is demonstrated that is able to learn parts of faces and semantic features of text and is in contrast to other methods that learn holistic, not parts-based, representations. Expand
Probabilistic latent semantic indexing
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a trainingExpand