Learn More
The uneven distribution of recombination across the length of chromosomes results in inaccurate estimates of genetic to physical distances. In wheat (Triticum aestivum L.) chromosome 3B, it has been estimated that 90% of the cross over events occur in distal sub-telomeric regions representing 40% of the chromosome. Radiation hybrid (RH) mapping which does(More)
Genomics data has many properties that make it different from "typical" relational data. The presence of multi-valued attributes as well as the large number of null values led us to a P-tree-based bit-vector representation in which matching 1-values were counted to evaluate similarity between genes. Quantitative information such as the number of(More)
Keywords: Knowledge discovery Pattern mining Financial applications Stock market Time series data a b s t r a c t Similarities among subsequences are typically regarded as categorical features of sequential data. We introduce an algorithm for capturing the relationships among similar, contiguous subsequences. Two time series are considered to be similar(More)
Doubts have been raised that time series subsequences can be clustered in a meaningful way. This paper introduces a kernel-density-based algorithm that detects meaningful patterns in the presence of a vast number of random-walk-like subsequences. The value of density-based algorithms for noise elimination in general has long been demonstrated. The challenge(More)
Noise levels in time series subsequence data are typically very high, and properties of the noise differ from those of white noise. The proposed algorithm incorporates a continuous random-walk noise model into kernel-density-based clustering. Evaluation is done by testing to what extent the resulting clusters are predictive of the process that generated the(More)
Given a set of training data, nearest neighbor classification predicts the class value for an unknown tuple X by searching the training set for the k nearest neighbors to X and then classifying X according to the most frequent class among the k neighbors. Each of the k nearest neighbors casts an equal vote for the class of X. In this paper, we propose a new(More)
Protein-protein interactions are of great interest to biologists. A variety of high-throughput techniques have been devised, each of which leads to a separate definition of an interaction network. The concept of differential association rule mining is introduced to study the annotations of proteins in the context of one or more interaction networks.(More)
A clustering algorithm is introduced that combines the strengths of clustering and motif finding techniques. Clusters are identified based on unambiguously defined sequence sections as in motif finding algorithms. The definition of similarity within clusters allows transitive matches and, thereby, enables the discovery of remote homologies that cannot be(More)