Learn More
Common substring problems allowing errors are known to be NP-hard. The main challenge of the problems lies in the combinatorial explosion of potential candidates. In this paper, we propose and study a generalized center string (GCS) problem, where not only all models (center strings) of any length, but also the positions of all their (degenerative)(More)
Predicting domains of proteins is an important and challenging problem in computational biology because of its significant role in understanding the complexity of proteomes. Although many template-based prediction servers have been developed, ab initio methods should be designed and further improved to be the complementarity of the template-based methods.(More)
Identification of transcription factor binding sites (also called ‘motif discovery’) in DNA sequences is a basic step in understanding genetic regulation. Although many successful programs have been developed, the problem is far from being solved on account of diversity in gene expression/regulation and the low specificity of binding sites. State-of-the-art(More)
K-means clustering is widely used due to its fast convergence, but it is sensitive to the initial condition.Therefore, many methods of initializing K-means clustering have been proposed in the literatures. Compared with Kmeans clustering, a novel clustering algorithm called affinity propagation (AP clustering) has been developed by Frey and Dueck, which can(More)
ChIP-seq, which combines chromatin immunoprecipitation (ChIP) with next-generation parallel sequencing, allows for the genome-wide identification of protein-DNA interactions. This technology poses new challenges for the development of novel motif-finding algorithms and methods for determining exact protein-DNA binding sites from ChIP-enriched sequencing(More)
A popular solution to improving the speed and scalability of association rule mining is to do the algorithm on a random sample instead of the entire database. But it is at the expense of the accuracy of answers. In this paper, we present a sampling ensemble approach to improve the accuracy for a given sample size. Then, using Monte Carlo theory, we give an(More)
Network modeling and analysis have been developed as one of the promising approaches for exploring the regularities behind the phenomena of complex organization and interactions in many significant fields. Traditional Chinese medicine (TCM) is a kind of holistic medical science, usually in whose clinical setting herb prescriptions consisting of several(More)
BACKGROUND Symptoms and signs (symptoms in brief) are the essential clinical manifestations for individualized diagnosis and treatment in traditional Chinese medicine (TCM). To gain insights into the molecular mechanism of symptoms, we develop a computational approach to identify the candidate genes of symptoms. METHODS This paper presents a network-based(More)