Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts

  title={Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts},
  author={Sujoy Roy and Daqing Yun and Behrouz Madahian and Michael W. Berry and Lih-Yuan Deng and Dan Goldowitz and Ramin Homayouni},
  journal={Frontiers in Bioengineering and Biotechnology},
In this study, we developed and evaluated a novel text-mining approach, using non-negative tensor factorization (NTF), to simultaneously extract and functionally annotate transcriptional modules consisting of sets of genes, transcription factors (TFs), and terms from MEDLINE abstracts. A sparse 3-mode term × gene × TF tensor was constructed that contained weighted frequencies of 106,895 terms in 26,781 abstracts shared among 7,695 genes and 994 TFs. The tensor was decomposed into sub-tensors… 

Figures from this paper

Evaluation of Sirtuin-3 probe quality and co-expressed genes using literature cohesion

This study tested whether literature derived functional cohesion could be used as an objective metric in lieu of ‘ground truth’ to evaluate the quality of probes and microarray datasets and found that the LPv approach can distinguish high quality Sirt3 probes.

Mining Multimodal Big Data: Tensor Methods and Applications

This chapter provides an overview of tensor factorization methods as well as a literature review of selected applications in areas that are currently experiencing exponential data growth and likely of interest to a broad audience.

A systematic review on literature-based discovery workflow

This systematic review provides a comprehensive overview of the LBD workflow by answering nine research questions related to the major components of theLBD workflow (i.e., input, process, output, and evaluation).

A Systematic Review on Literature-based Discovery

This systematic review provides an in-depth analysis of the computational techniques used in the LBD process using a novel, up-to-date, and detailed classification and discusses the prevailing research deficiencies in the discipline by highlighting the challenges and opportunities of future LBD research.



Nonnegative Tensor Factorization of Biomedical Literature for Analysis of Genomic Data

This work explores the utility of nonnegative tensor factorization to extract semantic relationships between genes and the transcription factors (TFs) that regulate them, using a previously published microarray dataset and provides proof-of-concept that nonnegative Tensor Factorization could be useful in interpretation of genomic data.

Latent Semantic Indexing of PubMed abstracts for identification of transcription factor candidates from microarray derived gene sets

The results suggest that the LSI based text mining approach can complement existing approaches used in systems biology research to decipher gene regulatory networks by providing putative lists of ranked TFs that might be explicitly or implicitly associated with sets of DEGs derived from microarray experiments.

Computer-assisted curation of a human regulatory core network from the biological literature

A text-mining-assisted workflow was developed to systematically extract knowledge about regulatory interactions between human TFs from the biological literature and was able to increase curated information about the human core transcriptional network by >60% compared with the current content of regulatory databases.

Assigning roles to protein mentions: The case of transcription factors

Dragon TF Association Miner: a system for exploring transcription factor associations through text-mining

DTFAM complements the existing biological resources by collecting, assessing, extracting and presenting associations that can reveal some of the not so easily observable connections among the entities found which could explain the functions of TFs and help decipher parts of gene transcriptional regulatory networks.

Prioritization, clustering and functional annotation of MicroRNAs using latent semantic indexing of MEDLINE abstracts

An automated text mining approach using Latent Semantic Indexing (LSI) for prioritization, clustering and functional annotation of miRNAs found that LSI identified keyword-to-miRNA relationships with high accuracy and demonstrated that pair-wise associations between miRNAAs can be used to group them into categories which are functionally aligned.

Discovering gene functional relationships using FAUN (Feature Annotation Using Nonnegative matrix factorization)

A Web-based bioinformatics software environment called FAUN or Feature Annotation Using Nonnegative matrix factorization (NMF) to facilitate both the discovery and classification of functional relationships among genes and its utility and performance as a knowledge discovery tool is demonstrated.

Gene clustering by Latent Semantic Indexing of MEDLINE abstracts

It is demonstrated here that pairwise distances derived from the vector angles of gene abstract documents can be effectively used to functionally group genes by hierarchical clustering, and provide proof-of-principle that LSI is a robust automated method to elucidate both known (explicit) and unknown (implicit) gene relationships from the biomedical literature.

Integrative Analysis of Many Weighted Co-Expression Networks Using Tensor Computation

This work has developed a novel, tensor-based computational framework for mining recurrent heavy subgraphs in a large set of massive weighted networks, and identified a large number of modules that occur predominately under specific phenotypes.

Clustering microarray-derived gene lists through implicit literature relationships

A novel method that uses implicit literature relationships (concepts related via shared, intermediate concepts) to cluster related genes within gene lists via their implicit relationships in the literature is developed.