Learn More
The accurate mapping of reads that span splice junctions is a critical component of all analytic techniques that work with RNA-seq data. We introduce a second generation splice detection algorithm, MapSplice, whose focus is high sensitivity and specificity in the detection of splices as well as CPU and memory efficiency. MapSplice can be applied to both(More)
This paper presents an unbalanced tree search (UTS) benchmark designed to evaluate the performance and ease of programming for parallel applications requiring dynamic load balancing. We describe algorithms for building a variety of unbalanced search trees to simulate different forms of load imbalance. We created versions of UTS in two parallel languages,(More)
Frequent itemset mining is a popular and important first step in the analysis of data arising in a broad range of applications. The traditional " exact " model for frequent itemsets requires that every item occurs in each supporting transaction. Real data is typically subject to noise and measurement error. To date, the effects of noise on exact frequent(More)
if we carefully choose σ such that σ is smaller than the ratio of the length of a shortest path to the length of the second shortest path. ABSTRACT • With the development of emerging social networks, such as • Facebook and MySpace, security and privacy threats arising from social network analysis bring a risk of disclosure of confidential knowledge when the(More)
Subspace clustering has attracted great attention due to its capability of finding salient patterns in high dimensional data. Order preserving subspace clusters have been proven to be important in high throughput gene expression analysis, since functionally related genes are often co-expressed under a set of experimental conditions. Such co-expression(More)
The soundness of clustering in the analysis of gene expression profiles and gene function prediction is based on the hypothesis that genes with similar expression profiles may imply strong correlations with their functions in the biological activities. Gene ontology (GO) has become a well accepted standard in organizing gene function categories. Different(More)
The RNA transcriptome varies in response to cellular differentiation as well as environmental factors, and can be characterized by the diversity and abundance of transcript isoforms. Differential transcription analysis, the detection of differences between the transcriptomes of different cells, may improve understanding of cell differentiation and(More)
We investigate the problem of clustering on distributed data streams. In particular, we consider the k-median clustering on stream data arriving at distributed sites which communicate through a routing tree. Distributed clustering on high speed data streams is a challenging task due to limited communication capacity, storage space, and computing power at(More)