Towards Parallel Spatial Query Processing for Big Spatial Data

@article{Zhong2012TowardsPS,
  title={Towards Parallel Spatial Query Processing for Big Spatial Data},
  author={Yunqin Zhong and Jizhong Han and Tieying Zhang and Zhenhua Li and Jinyun Fang and Guihai Chen},
  journal={2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops \& PhD Forum},
  year={2012},
  pages={2085-2094}
}
  • Yunqin Zhong, Jizhong Han, Guihai Chen
  • Published 21 May 2012
  • Computer Science
  • 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
In recent years, spatial applications have become more and more important in both scientific research and industry. Spatial query processing is the fundamental functioning component to support spatial applications. However, the state-of-the-art techniques of spatial query processing are facing significant challenges as the data expand and user accesses increase. In this paper we propose and implement a novel scheme (named VegaGiStore) to provide efficient spatial query processing over big… 
Efficient spark-based framework for big geospatial data query processing and analysis
TLDR
This paper introduces a generic framework for optimizing the performance of big spatial data queries on top of Apache Spark and supports advanced management functions including a unique self-adaptable load-balancing service to self-tune framework execution.
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce
TLDR
Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop and integrated into Hive to support declarative spatial queries with an integrated architecture is presented.
An extra spatial hierarchical schema in key-value store
TLDR
This paper advocates an extra spatial hierarchical schema inspired by geohash, and design spatial query method based on primary keys index, and tests the query accuracy and efficiency based on this schema even without the help of a spatial index.
Haggis: turbocharge a MapReduce based spatial data warehousing system with GPU engine
TLDR
This paper extends Hadoop-GIS, a MapReduce based spatial query system, and provides GPU accelerated spatial query processing capability at the engine level, and demonstrates that GPU accelerated system can gain considerable performance improvements.
Performance evaluation of SpatialHadoop for big web mapping data
TLDR
This study investigates the performance of SpatialHadoop and compares it against a variety of datasets and with the use of different operations including index creation, K-Nearest Neighbor (KNN), spatial join, and so on and demonstrates that as the volume of data increases, Spatial Hadoop scales well and performs better than the relational engine.
GeoSpark SQL: An Effective Framework Enabling Spatial Queries on Spark
TLDR
This paper aims to address the increasingly large-scale spatial query-processing requirement in the era of big data, and proposes an effective framework GeoSpark SQL, which enables spatial queries on Spark, and notes that Spark is not a panacea.
Big Data Storage Techniques for Spatial Databases: Implications of Big Data Architecture on Spatial Query Processing
TLDR
This paper reviews the various approaches with Hadoop to handle spatial data efficiently, categorizes the spatial queries reported in the testing, summarizes results, and identifies strengths and weaknesses with each approach.
Spatio-Temporal Join on Apache Spark
TLDR
This paper details several variants of a spatial join operation that addresses both spatial, temporal, and attribute-based joins that runs in commercial off-the-shelf (COTS) application.
An improved integrated Grid and MapReduce‐Hadoop architecture for spatial data: Hilbert TGS R‐Tree–based IGSIM
TLDR
A thorough literature survey has been done on the available traditional spatial indexes from the serial programming environment and Hilbert TGS R‐Tree has been selected on the basis of several parameters for its parallel implementation and extending spatial query efficiency work of the IGSIM.
Scalable and Fast Top-k Most Similar Trajectories Search Using MapReduce In-Memory
TLDR
This work proposes a distributed parallel approach for k-NN trajectories search in a multi-user environment using MapReduce in-memory, and proposes a space/time data partitioning based on Voronoi diagrams and time pages in order to provide both spatial-temporal data organization and process decentralization.
...
...

References

SHOWING 1-10 OF 23 REFERENCES
SJMR: Parallelizing spatial join with MapReduce on clusters
TLDR
SJMR (Spatial Join with MapReduce), a novel parallel algorithm to relieve the problem of heterogeneous related data sets processing, which is common in operations like spatial joins is presented.
Revisiting R-Tree Construction Principles
TLDR
It is argued that dynamic R-tree construction is a typical clustering problem which can be addressed by incorporating existing clustering algorithms, and adopted the well-known k-means algorithm as a working example.
Supporting Complex Multi-Dimensional Queries in P2P Systems
  • B. Liu, Wang-Chien Lee, Lee
  • Computer Science
    25th IEEE International Conference on Distributed Computing Systems (ICDCS'05)
  • 2005
TLDR
Network-R-tree (NR-tree), a P2P adaptation of the dominant spatial index - R*-tree was proposed, which is capable of processing complex queries such as range queries and k-nearest neighbor queries.
Quadtree and R-tree indexes in oracle spatial: a comparison using GIS data
TLDR
This paper first describes briefly the implementation of Quadtree and R-tree index structures and related optimizations in Oracle Spatial, then examines the relative merits of two structures as implemented inOracle Spatial and compares their performance for different types of queries and other operations.
An introduction to spatial database systems
TLDR
This work surveys data modeling, querying, data structures and algorithms, and system architecture for spatial database systems, with the emphasis on describing known technology in a coherent manner, rather than listing open problems.
Bigtable: A Distributed Storage System for Structured Data
TLDR
The simple data model provided by Bigtable is described, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable are described.
Using a distributed quadtree index in peer-to-peer networks
TLDR
A distributed quadtree index that adapts the MX-CIF quadtree is described that enables more powerful accesses to data in P2P networks and is easy to use, scalable, and exhibits good load-balancing properties.
Hadoop++
TLDR
This paper proposes a new type of system named Hadoop++: it boosts task performance without changing the Hadooper framework at all (Hadoop does not even 'notice it'), and shows the superiority of Hadoo++ over both Hadoops and HadoOPDB for tasks related to indexing and join processing.
Spatial databases - a tour
TLDR
An introduction to Spatial Databases and Trends in Spatial Data Mining.
Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS
TLDR
This paper proposes an approach to optimize I/O performance of small files on HDFS by combining small files into large ones to reduce the file number and build index for each file.
...
...