Multi-dimensional geospatial data mining in a distributed environment using MapReduce

@article{Alkathiri2019MultidimensionalGD,
  title={Multi-dimensional geospatial data mining in a distributed environment using MapReduce},
  author={Mazin Alkathiri and Abdul Jhummarwala and Madhukar B. Potdar},
  journal={Journal of Big Data},
  year={2019},
  volume={6},
  pages={1-34}
}
Data mining and machine learning techniques for processing raster data consider a single spectral band of data at a time. The individual results are combined to obtain the final output. The essence of related multi-spectral information is lost when the bands are considered independently. The proposed platform is based on Apache Hadoop ecosystem and supports performing analysis on large amounts of multispectral raster data using MapReduce. A novel technique of transforming the spectral space to… 
Flexible big data approach for geospatial analysis
TLDR
This paper provides a large scale way for the geological processing of large aerial LiDAR stage clouds by utilizing Spark and Cassandra and suggests an integrated approach to resolve faults, raise the classification procedure consistency, and also the digital terrain models (DTMs) obtained while lessening user interaction.
Knowledge Discovery Web Service for Spatial Data Infrastructures
TLDR
The proposed approach is called Knowledge Discovery Web Service (KDWS), which can be used as a layer on top of SDIs to provide spatial data users and decision makers with the possibility of extracting knowledge from massive heterogeneous spatial data in SDIs.
Massive Power Information Processing Scheme Based on MongoDB
  • Yao Xu, Jia-yang Wang
  • Computer Science
    IOP Conference Series: Earth and Environmental Science
  • 2020
TLDR
A set of high-availability and high-performance data storage and pre-processing scheme for power-consuming universities is constructed using one of the non-relational databases (NoSQL) to store power consumption data instead of the traditional relational databases.
Geo-Marketing Segmentation with Deep Learning
TLDR
The results of this study demonstrate a high clustering performance (4 × 4 neurons) as well as a significant dimensionality reduction by using self-organizing maps in the B2B industrial automation market across the United States.
Integrated Processing of Spatial Information based on Multidimensional Data Models for General Planning Tasks
TLDR
The developed method is supposed to be used in BIM (Building Information Modeling) technology of computer modeling to solve general planning tasks and contains a unified description of spatial and attribute data in the form of a multidimensional information object.
A deep learning approach for forecasting non-stationary big remote sensing time series
TLDR
A suitable method to forecast the Normalized Difference Vegetation Index (NDVI) time series (TS) from RS big data is introduced by combining big data system, wavelet transform (WT), long short-term memory (LSTM) neural network.
Design of Distributed Human Resource Management System of Spark Framework Based on Fuzzy Clustering
TLDR
A distributed human resource management system based on Spark framework that can achieve user management, employee information, attendance, evaluation, performance, salary, personnel change, and other business management is proposed.
A decision support system for Taiwan’s forest resource management using Remote Sensing Big Data
ABSTRACT This study aims to incorporate the application of RS technology with management strategies and proposes the Remote Sensing Knowledge-Based Decision Support System (RS-KBDSS) framework. It is
PACELC: Enchantment multi-dimension TensorFlow for value creation through Big Data
  • A. Yasmin, S. Kamalakkannan
  • Computer Science
    2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)
  • 2020
TLDR
The operation and control of the controller for analytical is described, there is a clear link between the size of the compressor, the vibration level and the lens pool, learning new machine learning tools.
Comparative analysis of human development indicators: Tanger-Tetouan-Al Hoceima region
Human development is more than a question of the accumulation of wealth, income, or economic growth. It must be human-centred. This is why concerns as necessary as respect for human rights, the
...
...

References

SHOWING 1-10 OF 67 REFERENCES
High-Performance Geospatial Big Data Processing System Based on MapReduce
TLDR
The overall architecture and data model of Marmot is explained as well as the main algorithm for automatic construction of MapReduce jobs from a given spatial analysis task, demonstrating that Marmot generally outperforms SpatialHadoop, one of the top plug-in based spatial big data frameworks, particularly in dealing with complex and time-intensive queries involving spatial index.
Comparative analysis of SpatialHadoop and GeoSpark for geospatial big data analytics
TLDR
The architectural view of SpatialHadoop and GeoSpark is compared and the merits and demerits of these tools according the execution times and volume of the data which has been used are summarised.
Geospatial Hadoop (GS-Hadoop) an efficient mapreduce based engine for distributed processing of shapefiles
TLDR
The proposed Extended Shapefile format (.shpx) allows MapReduce to directly access the shapefile component files using Memory mapped Input Output and the accompanied ShapeDist library has been compared with the most widely used archival formats.
Geo-spatial Big Data Mining Techniques
TLDR
The evolution of data mining techniques over last two decades and efforts made in developing big data analytics, especially as applied to geospatial big data are reviewed.
Parallel K-Means Clustering Based on MapReduce
TLDR
This paper proposes a parallel k -means clustering algorithm based on MapReduce, which is a simple yet powerful parallel programming technique and demonstrates that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.
Large Scale Analytics of Vector+Raster Big Spatial Data
TLDR
This paper advocates a third approach that mixes the raw representations of both vector and raster data in the query processor, and proposes a novel method, called Scanline method, which does not require a conversion between raster and vector.
STING: A Statistical Information Grid Approach to Spatial Data Mining
TLDR
The idea is to capture statistical information associated with spatial cells in such a manner that whole classes of queries and clustering problems can be answered without recourse to the individual objects.
Comparing Apache Spark and Map Reduce with Performance Analysis using K-Means
TLDR
Two of the comparison of - Hadoop Map Reduce and the recently introduced Apache Spark - both of which provide a processing model for analyzing big data are discussed, both of whom vary significantly based on the use case under implementation.
STING+: an approach to active spatial data mining
  • Wei Wang, Jiong Yang, R. Muntz
  • Computer Science
    Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337)
  • 1999
TLDR
This paper introduces an active spatial datamining approach which extends the current spatial data mining algorithms to efficiently support user-defined triggers on dynamically evolving spatial data and employs a hierarchical structure with associated statistical information at the various levels of the hierarchy to exploit the locality of the effect of an update and the nature of spatial data.
...
...