Tornado: A Distributed Spatio-Textual Stream Processing System

@article{Mahmood2015TornadoAD,
  title={Tornado: A Distributed Spatio-Textual Stream Processing System},
  author={Ahmed R. Mahmood and Ahmed M. Aly and Thamir M. Qadah and El Kindi Rezig and Anas Daghistani and Amgad Madkour and Ahmed S. Abdelhamid and Mohamed S. Hassan and Walid G. Aref and Saleh M. Basalamah},
  journal={Proc. VLDB Endow.},
  year={2015},
  volume={8},
  pages={2020-2023}
}
The widespread use of location-aware devices together with the increased popularity of micro-blogging applications (e.g., Twitter) led to the creation of large streams of spatio-textual data. In order to serve real-time applications, the processing of these large-scale spatio-textual streams needs to be distributed. However, existing distributed stream processing systems (e.g., Spark and Storm) are not optimized for spatial/textual content. In this demonstration, we introduce Tornado, a… 

Figures from this paper

SRC: tornado: a distributed spatio-textual stream processing system
TLDR
Tornado is adaptive, i.e., it dynamically redistributes the workload across worker processes according to changes in the distribution of spatio-textual data and queries, which significantly improves the overall system performance.
Adaptive processing of spatial-keyword data over a distributed streaming cluster
TLDR
A two-layered indexing scheme for the distributed processing of spatial-keyword data streams is introduced and extensive experimental evaluation indicates that Tornado achieves high scalability and more than 2x improvement over the baseline approach in terms of the overall system throughput.
SSTD: A Distributed System on Streaming Spatio-Textual Data
TLDR
This paper presents SSTD (Streaming Spatio-Textual Data), a distributed in-memory system supporting both continuous and snapshot queries with spatial, textual, and temporal constraints over data streams, and adopts a novel workload partitioning method termed QT (QuadText) tree.
SIGSPATIAL: G: Scalable Query Processing In Spatio-Textual Data Management Systems
TLDR
To efficiently process spatiotextual streams, Tornado is presented, a distributed in-memory spatio-textual stream processing system that extends Storm with a two-layered spatiotesxtual indexing layer that significantly improves the overall system performance.
Distributed Publish/Subscribe Query Processing on the Spatio-Textual Data Stream
TLDR
This paper proposes a distributed publish/subscribe system, called PS2Stream, which digests a massive spatio-textual data stream and directs the stream to target users with registered interests, and proposes a new workload distribution algorithm considering both space and text properties of the data.
HASTE: A Distributed System for Hybrid and Adaptive Processing on Streaming Spatial-Textual Data
TLDR
This work proposes a distributed system, called HASTE, for hybrid and adaptive processing on streaming spatial-textual data, and reports on extensive experiments with real-world data that offer insight into the performance of the solution and shows that the solution is capable of outperforming the state-of-the-art proposals.
FAST: Frequency-Aware Indexing for Spatio-Textual Data Streams
TLDR
FAST is a main-memory index that requires up to one third of the memory needed by the state-of-the-art index, and FAST adaptively accounts for the difference in the frequencies of keywords within their corresponding spatial regions to automatically choose the best indexing approach that optimizes the insertion and search times.
FAST: Frequency-Aware Spatio-Textual Indexing for In-Memory Continuous Filter Query Processing
TLDR
FAST is a main-memory index that requires up to one third of the memory needed by the state-of-the-art index, and FAST adaptively accounts for the difference in the frequencies of keywords within their corresponding spatial regions to automatically choose the best indexing approach that optimizes the insertion and search times.
Query Processing Techniques for Big Spatial-Keyword Data
TLDR
This 1.5 hour tutorial explores recent research efforts in the area of big spatial-keyword processing with special attention to data indexing and spatial and keyword data partitioning.
Spatio-Temporal Data Streams
TLDR
Spatio-Temporal Data Streams is a valuable resource for researchers studying spatio-temporal data streams and Big Data analytics, as well as data engineers and data scientists solving data management and analytics problems associated with this class of data.
...
...

References

SHOWING 1-9 OF 9 REFERENCES
Query Processing Techniques for Big Spatial-Keyword Data
TLDR
This 1.5 hour tutorial explores recent research efforts in the area of big spatial-keyword processing with special attention to data indexing and spatial and keyword data partitioning.
Atlas: on the expression of spatial-keyword group queries using extended relational constructs
TLDR
Atlas, an SQL extension to express complex spatial-keyword group queries is introduced, which uses simple declarative spatial and textual building-block operators and predicates to extend SQL to represent spatio-textual group queries.
Indexing recent trajectories of moving objects
TLDR
Experimental evaluation illustrates that the trails-tree outperforms the state-of-the-art index structures for indexing recent trajectory data by up to a factor of two.
Taghreed: a system for querying, analyzing, and visualizing geotagged microblogs
TLDR
Taghreed, a full-fledged system for efficient and scalable querying, analyzing, and visualizing geotagged microblogs, e.g., tweets, is presented; the first system that addresses all these challenges collectively for microblogs data.
ST-HBase: A Scalable Data Management System for Massive Geo-tagged Objects
TLDR
ST-HBase has good scalability and outperforms the state-of-the-art approaches in terms of update and query performance, and two kinds of index approaches are proposed: Spatial and Textual Based Hybrid Index and Term Cluster Based Inverted Spatial Index which are suitable for different scenarios.
D-CAPE: distributed and self-tuned continuous query processing
TLDR
D-Cape, a distributed continuous query processing architecture that employs stream query engines over a cluster of shared-nothing processors, addresses two critical questions: how to initially distribute query plans given little or possibly no cost information, and how to efficiently adapt the query distribution corresponding to runtime environmental changes.
A Demonstration of SpatialHadoop: An Efficient MapReduce Framework for Spatial Data
TLDR
This demo presents SpatialHadoop as the first full-fledged MapReduce framework with native support for spatial data and demonstrates a real system prototype of Spatial Hadoop running on an Amazon EC2 cluster against two sets of real spatial data obtained from Tiger Files and OpenStreetMap.
MNTG: An Extensible Web-Based Traffic Generator
TLDR
Minnesota Traffic Generator (MNTG) is proposed; an extensible web-based road network traffic generator that overcomes the hurdles of using existing traffic generators and serves as a wrapper overexisting traffic generators, making them easy to use, configure, and run for any arbitrary spatial road region.
Parallel SECONDO: A practical system for large-scale processing of moving objects
TLDR
This paper imports the data from the project OpenStreetMap into Secondo databases to build up the urban traffic network and then processes network-based queries like map-matching and symbolic trajectory pattern matching, achieving an impressive performance in Parallel Secondo after being converted to the corresponding parallel queries.