• Corpus ID: 251402866

Kernel Biclustering algorithm in Hilbert Spaces

@inproceedings{Matabuena2022KernelBA,
  title={Kernel Biclustering algorithm in Hilbert Spaces},
  author={Marcos Matabuena and Juan C. Vidal and Oscar Hernan Madrid Padilla and D. Sejdinovic},
  year={2022}
}
Biclustering algorithms partition data and covariates simultaneously, providing new insights in several domains, such as analyzing gene expression to discover new biological functions. This paper develops a new model-free biclustering algorithm in abstract spaces using the notions of energy distance (ED) and the maximum mean discrepancy (MMD) – two distances between probability distributions capable of handling complex data such as curves or graphs. The proposed method can learn more general… 

References

SHOWING 1-10 OF 59 REFERENCES

Biclustering with Alternating K-Means

This paper provides a new formulation of the biclustering problem based on the idea of minimizing the empirical clustering risk by alternating the use of an adapted version of the k-means clustering algorithm between columns and rows.

Profile likelihood biclustering

A new heuristic optimization procedure based on the Kernighan-Lin heuristic, which has nice computational properties and performs well in simulations, is proposed and proved that the procedure recovers the true row and column classes when the dimensions of the data matrix tend to infinity.

Convex biclustering

This work presents a convex formulation of the biclustering problem that possesses a unique global minimizer and an iterative algorithm, COBRA, that is guaranteed to identify it and demonstrates the advantages of the approach, which includes stably and reproducibly identifying biclusterings, on simulated and real microarray data.

A Bayesian model for biclustering with applications

  • Jian Zhang
  • Computer Science
  • 2010
An empirical Bayes algorithm for sampling posteriors, in which the cluster memberships of all genes and samples are estimated by maximizing an explicit marginal posterior of these memberships, makes the estimation of the Bayesian plaid model computationally feasible and efficient.

Bayesian biclustering of gene expression data

The BBC algorithm is shown to be a robust model-based biclustering method that can discover biologically significant gene-condition clusters in microarray data and has the potential to be extended to integrated study of gene transcription networks.

Applied Biclustering Methods for Big and High-Dimensional Data Using R

Applied Biclustering Methods for Big and High-Dimensional Data Using R shows how to apply biclustered methods to find local patterns in a big data matrix.

Biclustering algorithms for biological data analysis: a survey

In this comprehensive survey, a large number of existing approaches to biclustering are analyzed, and they are classified in accordance with the type of biclusters they can find, the patterns of bIClusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications.

Finding large average submatrices in high dimensional data

A statistically motivated biclustering procedure that finds large average submatrices within a given real-valued data matrix and is driven by a Bonferroni-based significance score that effectively trades off between submatrix size and average value is proposed.

Sparse Biclustering of Transposable Data

  • Kean Ming TanD. Witten
  • Computer Science
    Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America
  • 2014
This paper proposes a framework for biclustering based on the matrix-variate normal distribution and shows that k-means clustering of the rows and of the columns of a data matrix can be seen as special cases of this proposal and that a relaxation of the proposal yields the singular value decomposition.

Spectral biclustering of microarray data: coclustering genes and conditions.

This work develops a method that simultaneously clusters genes and conditions, finding distinctive "checkerboard" patterns in matrices of gene expression data, if they exist, and applies it to a selection of publicly available cancer expression data sets.
...