Learn More
We describe a method for recovering the underlying parametrization of scattered data (m(i)) lying on a manifold M embedded in high-dimensional Euclidean space. The method, Hessian-based locally linear embedding, derives from a conceptual framework of local isometry in which the manifold M, viewed as a Riemannian submanifold of the ambient Euclidean Space(More)
We describe a method to recover the underlying parametrization of scattered data (m i) lying on a manifold M embedded in high-dimensional Euclidean space. The method, Hessian-based Locally Linear Embedding (HLLE), derives from a conceptual framework of Local Isometry in which the manifold M , viewed as a Riemannian submanifold of the ambient Euclidean space(More)
Highly available cloud storage is often implemented with complex, multi-tiered distributed systems built on top of clusters of commodity servers and disk drives. Sophisticated management, load balancing and recovery techniques are needed to achieve high performance and availability amidst an abundance of failure sources that include software, hardware,(More)
Recently, the Isomap procedure [1] was proposed as a new way to recover a low-dimensional parametrization of data lying on a low-dimensional submanifold in high-dimensional space. The method assumes that the submanifold, viewed as a Riemannian submanifold of the ambient high-dimensional space, is isometric to a convex subset of Euclidean space. This(More)
Short assigned question-answering style tasks are often used as a probe to understand how users do search. While such assigned tasks are simple to test and are effective at eliciting the particulars of a given search capability, they are not the same as naturalistic searches. We studied the quantitative differences between assigned tasks and self-chosen "(More)
The practice of guiding a search engine based on query logs observed from the engine's user population provides large volumes of data but potentially also sacrifices the privacy of the user. In this paper, we ask the following question: Is it possible, given rich instrumented data from a panel and usability study data, to observe complete information(More)
—We present a practical, market-based solution to the resource provisioning problem in a set of heterogeneous resource clusters. We focus on provisioning rather than immediate scheduling decisions to allow users to change long-term job specifications based on market feedback. Users enter bids to purchase quotas, or bundles of resources for long-term use.(More)
We track a large set of "rapidly" changing web pages and examine the assumption that the arrival of content changes follows a Poisson process on a microscale. We demonstrate that there are significant differences in the behavior of pages that can be exploited to maintain freshness in a web corpus.
Search engines strive to maintain a " current " repository of all pages on the web to index for user queries. However, crawling all pages all the time is costly and inefficient: many small websites don't support that much load and while some pages change very rapidly others don't change at all. Therefore, estimated frequency of change is often used to(More)