Timothy Hunter

Learn More
Many "big data" applications must act on data in real time. Running these applications at ever-larger scales requires parallel platforms that automatically handle faults and stragglers. Unfortunately, current distributed stream processing models provide fault recovery in an expensive manner, requiring hot replication or long recovery times, and do not(More)
We consider the problem of reconstructing vehicle trajectories from sparse sequences of GPS points, for which the sampling interval is between 1 s and 2 min. We introduce a new class of algorithms, which are altogether called the path inference filter (PIF), that maps GPS data in real time, for a variety of tradeoffs and scenarios and with a high(More)
We report on our experience scaling up the Mobile Millennium traffic information system using cloud computing and the Spark cluster computing framework. Mobile Millennium uses machine learning to infer traffic conditions for large metropolitan areas from crowdsourced data, and Spark was specifically designed to support such applications. Many studies of(More)
We consider the problem of estimating real-time traffic conditions from sparse, noisy GPS probe vehicle data. We specifically address arterial roads, which are also known as the secondary road network (highways are considered the primary road network). We consider several estimation problems: historical traffic patterns, real-time traffic conditions, and(More)
Many “big data” applications need to act on data arriving in real time. However, current programming models for distributed stream processing are relatively low-level, often leaving the user to worry about consistency of state across the system and fault recovery. Furthermore, the models that provide fault recovery do so in an expensive manner, requiring(More)
In this paper, we combine the most complete record of daily mobility, based on large-scale mobile phone data, with detailed Geographic Information System (GIS) data, uncovering previously hidden patterns in urban road usage. We find that the major usage of each road segment can be traced to its own--surprisingly few--driver sources. Based on this finding we(More)
Controlling and analyzing cyberphysical and robotics systems is increasingly becoming a Big Data challenge. Pushing this data to, and processing in the cloud is more efficient than on-board processing. However, current cloud-based solutions are not suitable for the latency requirements of these applications. We present a new concept, Discretized Streams or(More)
Controlling and analyzing cyberphysical and robotics systems is increasingly becoming a Big Data challenge. We study the case of predicting drivers' travel times in a large urban area from sparse GPS traces. We present a framework that can accommodate a wide variety of traffic distributions and spread all the computations on a cluster to achieve small(More)