Learn More
Development of a high quality reference sequence is a daunting task in crops like wheat with large (~17Gb), highly repetitive (>80%) and polyploid genome. To achieve complete sequence assembly of such genomes, development of a high quality physical map is a necessary first step. However, due to the lack of recombination in certain regions of the(More)
The uneven distribution of recombination across the length of chromosomes results in inaccurate estimates of genetic to physical distances. In wheat (Triticum aestivum L.) chromosome 3B, it has been estimated that 90% of the cross over events occur in distal sub-telomeric regions representing 40% of the chromosome. Radiation hybrid (RH) mapping which does(More)
Keywords: Knowledge discovery Pattern mining Financial applications Stock market Time series data a b s t r a c t Similarities among subsequences are typically regarded as categorical features of sequential data. We introduce an algorithm for capturing the relationships among similar, contiguous subsequences. Two time series are considered to be similar(More)
The species cytoplasm specific (scs) genes affect nuclear-cytoplasmic interactions in interspecific hybrids. A radiation hybrid (RH) mapping population of 188 individuals was employed to refine the location of the scs ae locus on Triticum aestivum chromosome 1D. “Wheat Zapper,” a comparative genomics tool, was used to predict synteny between wheat(More)
Doubts have been raised that time series subsequences can be clustered in a meaningful way. This paper introduces a kernel-density-based algorithm that detects meaningful patterns in the presence of a vast number of random-walk-like subsequences. The value of density-based algorithms for noise elimination in general has long been demonstrated. The challenge(More)
Genomics data has many properties that make it different from "typical" relational data. The presence of multi-valued attributes as well as the large number of null values led us to a P-tree-based bit-vector representation in which matching 1-values were counted to evaluate similarity between genes. Quantitative information such as the number of(More)
Noise levels in time series subsequence data are typically very high, and properties of the noise differ from those of white noise. The proposed algorithm incorporates a continuous random-walk noise model into kernel-density-based clustering. Evaluation is done by testing to what extent the resulting clusters are predictive of the process that generated the(More)
Given a set of training data, nearest neighbor classification predicts the class value for an unknown tuple X by searching the training set for the k nearest neighbors to X and then classifying X according to the most frequent class among the k neighbors. Each of the k nearest neighbors casts an equal vote for the class of X. In this paper, we propose a new(More)