Patrick O. Perry

Learn More
This article presents a form of bi-cross-validation (BCV) for choosing the rank in outer product models, especially the singular value decomposition (SVD) and the nonnegative matrix factorization (NMF). Instead of leaving out a set of rows of the data matrix, we leave out a set of rows and a set of columns, and then predict the left out entries by low rank(More)
This article presents a form of bi-cross-validation (BCV) for choosing the rank in outer product models, especially the singular value decomposition (SVD) and the non-negative matrix factorization (NMF). Instead of leaving out a set of rows of the data matrix, we leave out a set of rows and a set of columns, and then predict the left out entries by low rank(More)
Rank estimation is a classical model order selection problem that arises in a variety of important statistical signal and array processing systems, yet is addressed relatively infrequently in the extant literature. Here we present sample covariance asymptotics stemming from random matrix theory, and bring them to bear on the problem of optimal rank(More)
The analysis of datasets taking the form of simple, undirected graphs continues to gain in importance across a variety of disciplines. Two choices of null model, the logistic-linear model and the implicit log-linear model, have come into common use for analyzing such network data, in part because each accounts for the heterogeneity of network node degrees(More)
Recently, Mahoney and Orecchia demonstrated that popular diffusion-based procedures to compute a quick approximation to the first nontrivial eigenvector of a data graph Laplacian exactly solve certain regularized Semi-Definite Programs (SDPs). In this paper, we extend that result by providing a statistical interpretation of their approximation procedure.(More)
In multivariate regression models we have the opportunity to look for hidden structure unrelated to the observed predictors. However, when one fits a model involving such latent variables it is important to be able to tell if the structure is real, or just an artifact of correlation in the regression errors. We develop a new statistical test based on random(More)
Probabilistic methods for classifying text form a rich tradition in machine learning and natural language processing. For many important problems, however, class prediction is uninteresting because the class is known, and instead the focus shifts to estimating latent quantities related to the text, such as affect or ideology. We focus on one such problem of(More)
The Military Airspace Management System (MAMS) is a multi-user distributed scheduling prototype designed to support the scheduling of Special Use Airspace in the CONUS region. The prototype has emphasized the user interface design of the scheduling system as the primary means of producing de-conflicted schedules. This paper reports on work in progress and(More)