Learn More
Many "big data" applications must act on data in real time. Running these applications at ever-larger scales requires parallel platforms that automatically handle faults and stragglers. Unfortunately, current distributed stream processing models provide fault recovery in an expensive manner, requiring hot replication or long recovery times, and do not(More)
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to(More)
We consider the problem of reconstructing vehicle trajectories from sparse sequences of GPS points, for which the sampling interval is between 1 s and 2 min. We introduce a new class of algorithms, which are altogether called the path inference filter (PIF), that maps GPS data in real time, for a variety of tradeoffs and scenarios and with a high(More)
—Controlling and analyzing cyberphysical and robotics systems is increasingly becoming a Big Data challenge. Pushing this data to, and processing in the cloud is more efficient than on-board processing. However, current cloud-based solutions are not suitable for the latency requirements of these applications. We present a new concept, Discretized Streams or(More)
In this paper, we combine the most complete record of daily mobility, based on large-scale mobile phone data, with detailed Geographic Information System (GIS) data, uncovering previously hidden patterns in urban road usage. We find that the major usage of each road segment can be traced to its own--surprisingly few--driver sources. Based on this finding we(More)
We consider the problem of estimating real-time traffic conditions from sparse, noisy GPS probe vehicle data. We specifically address arterial roads, which are also known as the secondary road network (highways are considered the primary road network). We consider several estimation problems: historical traffic patterns, real-time traffic conditions, and(More)
We report on our experience scaling up the Mobile Millennium traffic information system using cloud computing and the Spark cluster computing framework. Mobile Millennium uses machine learning to infer traffic conditions for large metropolitan areas from crowdsourced data, and Spark was specifically designed to support such applications. Many studies of(More)
Many " Big Data " applications in Machine Learning (ML) need to react quickly to large streams of incoming data. The standard paradigm nowadays is to run ML algorithms on frameworks designed for batch operations, such as MapReduce or Hadoop. By design, these frameworks are not a good match for low-latency applications. This is why we explore using a new,(More)
Pregnancy is a state of physiologic adaptation, with significant changes in cardiovascular, renal, and hemodynamic systems. Aquaporins (AQPs) may play a role in facilitating these changes. While AQP expression has been assessed in several organs during pregnancy, little is known about its expression in the brain during pregnancy. Therefore, this study(More)