Learn More
Many "big data" applications must act on data in real time. Running these applications at ever-larger scales requires parallel platforms that automatically handle faults and stragglers. Unfortunately, current distributed stream processing models provide fault recovery in an expensive manner, requiring hot replication or long recovery times, and do not(More)
—We consider the problem of reconstructing vehicle trajectories from sparse sequences of GPS points, for which the sampling interval is between 10 seconds and 2 minutes. We introduce a new class of algorithms, called altogether path inference filter (PIF), that maps GPS data in real time, for a variety of trade-offs and scenarios, and with a high(More)
—Controlling and analyzing cyberphysical and robotics systems is increasingly becoming a Big Data challenge. We study the case of predicting drivers' travel times in a large urban area from sparse GPS traces. We present a framework that can accommodate a wide variety of traffic distributions and spread all the computations on a cluster to achieve small(More)
—Controlling and analyzing cyberphysical and robotics systems is increasingly becoming a Big Data challenge. Pushing this data to, and processing in the cloud is more efficient than on-board processing. However, current cloud-based solutions are not suitable for the latency requirements of these applications. We present a new concept, Discretized Streams or(More)
Most optimal routing problems focus on minimizing travel time or distance traveled. Oftentimes, a more useful objective is to maximize the probability of on-time arrival, which requires statistical distributions of travel times, rather than just mean values. We propose a method to estimate travel time distributions on large-scale road networks, using probe(More)
We report on our experience scaling up the Mobile Millennium traffic information system using cloud computing and the Spark cluster computing framework. Mobile Millennium uses machine learning to infer traffic conditions for large metropolitan areas from crowdsourced data, and Spark was specifically designed to support such applications. Many studies of(More)
— We study the problem of estimating sparse precision matrices from data with missing values. We show that the corresponding maximum likelihood problem is a Difference of Convex (DC) program by proving some new concavity results on the Schur complements. We propose a new algorithm to solve this problem based on the ConCave-Convex Procedure (CCCP), and we(More)
We present new algorithms for computing the log-determinant of symmetric , diagonally dominant matrices. Existing algorithms run with cubic complexity with respect to the size of the matrix in the worst case. Our algorithm computes an approximation of the log-determinant in time near-linear with respect to the number of non-zero entries and with high(More)
  • 1