Learn More
Frequent itemset mining is a popular and important first step in the analysis of data arising in a broad range of applications. The traditional " exact " model for frequent itemsets requires that every item occurs in each supporting transaction. Real data is typically subject to noise and measurement error. To date, the effects of noise on exact frequent(More)
The accurate mapping of reads that span splice junctions is a critical component of all analytic techniques that work with RNA-seq data. We introduce a second generation splice detection algorithm, MapSplice, whose focus is high sensitivity and specificity in the detection of splices as well as CPU and memory efficiency. MapSplice can be applied to both(More)
This paper presents an unbalanced tree search (UTS) benchmark designed to evaluate the performance and ease of programming for parallel applications requiring dynamic load balancing. We describe algorithms for building a variety of unbalanced search trees to simulate different forms of load imbalance. We created versions of UTS in two parallel languages,(More)
With the development of emerging social networks, such as Facebook and MySpace, security and privacy threats arising from social network analysis bring a risk of disclosure of confidential knowledge when the social network data is shared or made public. In addition to the current social network anonymity de-identification techniques, we study a situation,(More)
— Subspace clustering has attracted great attention due to its capability of finding salient patterns in high dimensional data. Order preserving subspace clusters have been proven to be important in high throughput gene expression analysis, since functionally related genes are often co-expressed under a set of experimental conditions. Such co-expression(More)
Beyond the ongoing privacy preserving social network studies which mainly focus on node de-identification and link protection, this paper is written with the intention of preserving the privacy of link's affinities, or weights, in a finite and directed social network. To protect the weight privacy of edges, we define a privacy measurement, k-anonymity, over(More)
Frequent itemset mining is a popular and important first step in analyzing data sets across a broad range of applications. The traditional, " exact " approach for finding frequent itemsets requires that every item in the itemset occurs in each supporting transaction. However, real data is typically subject to noise, and in the presence of such noise,(More)
Structural semantics are fundamental to understanding both natural and man-made objects from languages to buildings. They are manifested as repeated structures or patterns and are often captured in images. Finding repeated patterns in images, therefore, has important applications in scene understanding, 3D reconstruction, and image retrieval as well as(More)