Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis
The problem of defining the clustering structure in DNA methylation expressions is considered. A Dirichlet process beta mixture model (DPBMM) is proposed that models the DNA methylation data array. The model allows automatic learning of the cluster structure parameters such as the cluster mixing proportion, the models of each cluster, and especially the number of clusters. To enable the learning, we proposed a Gibbs sampling algorithm for computing the posterior distributions, hence the estimates of the parameters. We investigate the performance of the proposed clustering algorithm via simulation.