Emergent unsupervised clustering paradigms with potential application to bioinformatics.

@article{Miller2008EmergentUC,
  title={Emergent unsupervised clustering paradigms with potential application to bioinformatics.},
  author={David J. Miller and Yue Wang and George Kesidis},
  journal={Frontiers in bioscience : a journal and virtual library},
  year={2008},
  volume={13},
  pages={
          677-90
        }
}
In recent years, there has been a great upsurge in the application of data clustering, statistical classification, and related machine learning techniques to the field of molecular biology, in particular analysis of DNA microarray expression data. Clustering methods can be used to group co-expressed genes, shedding light on gene function and co-regulation. Alternatively, they can group samples or conditions to identify phenotypical groups, disease subgroups, or to help identify disease pathways… Expand
caBIG™ VISDA: Modeling, visualization, and discovery for cluster analysis of genomic data
TLDR
The VIsual Statistical Data Analyzer (VISDA) achieved robust and superior clustering accuracy, compared with several benchmark clustering schemes, and the model order selection scheme in VISDA was shown to be effective for high dimensional genomic data clustering. Expand
Learning Statistical and Geometric Models from Microarray Gene Expression Data
TLDR
A novel statistical data clustering and visualization algorithm that is comprehensive and effective for multiple clustering tasks and that overcomes some of the major limitations associated with existing clustering methods is proposed. Expand
18 Neural Networks in Bioinformatics
Over the last two decades, neural networks (NNs) gradually became one of the indispensable tools in bioinformatics. This was fueled by the development and rapid growth of numerous biologicalExpand
Neural Networks in Bioinformatics
TLDR
The most often used neural network architectures are summarized, and several specific applications including prediction of protein secondary structure, solvent accessibility, and binding residues are discussed, with a particular focus on applications in protein bioinformatics. Expand
Graph-based methods for large-scale protein classification and orthology inference
TLDR
It is argued that establishing true orthologous relationships requires a phylogenetic approach which combines both trees and graphs (networks), reliable species phylogeny, genomic data for more than two species, and an insight into the processes of molecular evolution. Expand
Unsupervised Data Mining Applications on High Dimensional Gene Expression Time Series in Toxicogenomics
Toxicogenomics, the study of adverse effects caused by toxicants to human health and environment via high-throughput genomics technologies, present promising alternatives to expensive and lengthyExpand
The properties of high-dimensional data spaces: implications for exploring gene and protein expression data
TLDR
This Review discusses the properties of high-dimensional data spaces that arise in genomic and proteomic studies and the challenges they can pose for data analysis and interpretation. Expand
Probing genetic algorithms for feature selection in comprehensive metabolic profiling approach.
  • W. Zou, V. Tolstikov
  • Chemistry, Medicine
  • Rapid communications in mass spectrometry : RCM
  • 2008
TLDR
The present study demonstrated that combination of comprehensive metabolic profiling and advanced data mining techniques provides a powerful metabolomic approach for biomarker discovery among small molecules. Expand
Development of an Unsupervised Pixel-based Clustering Algorithm for Compartmentalization of Immunohistochemical Expression Using Automated QUantitative Analysis
TLDR
This new clustering algorithm produces accurate and precise compartmentalization for assessment of target gene expression, and will enhance the efficiency and objectivity of the current Automated QUantitative Analysis and other image analysis platform. Expand
Dual Transcriptomic and Molecular Machine Learning Predicts all Major Clinical Forms of Drug Cardiotoxicity
TLDR
This work demonstrates prediction and preservation of cardiotoxic relationships for six drug-induced cardiotoxicity types using a machine learning approach on a large collected and curated dataset of transcriptional and molecular profiles. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 59 REFERENCES
Detecting stable clusters using principal component analysis.
TLDR
This chapter extends the stability-based validation of cluster structure, and proposes stability as a figure of merit that is useful for comparing clustering solutions, thus helping in making these choices. Expand
Interrelated two-way clustering: an unsupervised approach for gene expression data analysis
TLDR
A new framework for unsupervised analysis of gene expression data is presented which applies an interrelated two-way clustering approach to the gene expression matrices to find important gene patterns and perform cluster discovery on samples. Expand
Cluster analysis for gene expression data: a survey
TLDR
This paper divides cluster analysis for gene expression data into three categories, presents specific challenges pertinent to each clustering category and introduces several representative approaches, and suggests the promising trends in this field. Expand
Biclustering algorithms for biological data analysis: a survey
TLDR
In this comprehensive survey, a large number of existing approaches to biclustering are analyzed, and they are classified in accordance with the type of biclusters they can find, the patterns of bIClusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications. Expand
Learning the Tree of Phenotypes Using Genomic Data and VISDA
TLDR
This work proposes a stability analysis guided supervised clustering and visualization method aiming to discover the hierarchical structure in gene expression data, which it calls the "tree of phenotypes". Expand
Discriminatory Mining of Gene Expression Microarray Data
TLDR
Three different algorithms are evaluated in their abilities to reduce dimensionality and to visualize data sets: Principal Component Analysis (PCA), Discriminatory Component analysis (DCA), and Projection Pursuit Method (PPM). Expand
CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts
TLDR
CLIFF, an algorithm for clustering biological samples using gene expression microarray data that outperforms standard clustering approaches that do not consider the feature selection issue, and produces a result that is very close to the original expert labeling of the sample set. Expand
Non-redundant clustering with conditional ensembles
TLDR
This work presents a general algorithmic framework which makes use of cluster ensemble methods to solve the problem of finding a novel, "orthogonal" clustering in the data. Expand
Hierarchical, Unsupervised Learning with Growing via Phase Transitions
TLDR
The new approach is founded on the principle of minimum cross-entropy, using informative priors to approximate the unstructured clustering solution while imposing the structural constraint, and incorporates supervised learning principles applied in an unsupervised problem setting. Expand
Adjustment of systematic microarray data biases
TLDR
The new method of 'Distance Weighted Discrimination (DWD)' is shown to be better than Support Vector Machines and Singular Value Decomposition for the adjustment of systematic microarray effects and of general use as a tool for the discrimination of systematic problems present in microarray data sets, including the merging of two breast tumor data sets completed on different microarray platforms. Expand
...
1
2
3
4
5
...