Learn More
|This paper is an attempt to understand the processes by which software ages. We de ne code to be aged or decayed if its structure makes it unnecessarily di cult to understand or change, and we measure the extent of decay by counting the number of faults in code in a period of time. Using change management data from a very large, long-lived software system,(More)
Sparse singular value decomposition (SSVD) is proposed as a new exploratory analysis tool for biclustering or identifying interpretable row-column associations within high-dimensional data matrices. SSVD seeks a low-rank, checkerboard structured matrix approximation to data matrices. The desired checkerboard structure is achieved by forcing both the left-(More)
ÐA central feature of the evolution of large software systems is that changeÐwhich is necessary to add new functionality, accommodate new hardware, and repair faultsÐbecomes increasingly difficult over time. In this paper, we approach this phenomenon, which we term code decay, scientifically and statistically. We define code decay and propose a number of(More)
High dimension, low sample size data are emerging in various areas of science. We find a common structure underlying many such data sets by using a non-standard type of asymptotics: the dimension tends to 1 while the sample size is fixed. Our analysis shows a tendency for the data to lie deterministically at the vertices of a regular simplex. Essentially(More)
MOTIVATION Systematic differences due to experimental features of microarray experiments are present in most large microarray data sets. Many different experimental features can cause biases including different sources of RNA, different production lots of microarrays or different microarray platforms. These systematic effects present a substantial hurdle to(More)
High Dimension Low Sample Size statistical analysis is becoming increasingly important in a wide range of applied contexts. In such situations, it is seen that the appealing discrimination method called the Support Vector Machine can be improved. The revealing concept is “data piling” at the margin. This leads naturally to the development of “Distance(More)
Inconsistencies in the preparation of histology slides make it difficult to perform quantitative analysis on their results. In this paper we provide two mechanisms for overcoming many of the known inconsistencies in the staining process, thereby bringing slides that were processed or stored under very different conditions into a common, normalized space to(More)
A general framework for a novel non-geodesic decomposition of high-dimensional spheres or high-dimensional shape spaces for planar landmarks is discussed. The decomposition, principal nested spheres, leads to a sequence of submanifolds with decreasing intrinsic dimensions, which can be interpreted as an analogue of principal component analysis. In a number(More)
The fluctuations of Internet traffic possess an intricate structure which cannot be simply explained by long–range dependence and self–similarity. In this work, we explore the use of the wavelet spectrum, whose slope is commonly used to estimate the Hurst parameter of long–range dependence. We show that much more than simple slope estimates are needed for(More)
BACKGROUND Pancreatic ductal adenocarcinoma (PDAC) remains a lethal disease. For patients with localized PDAC, surgery is the best option, but with a median survival of less than 2 years and a difficult and prolonged postoperative course for most, there is an urgent need to better identify patients who have the most aggressive disease. METHODS AND(More)