• Corpus ID: 30322474

Hadoop-GIS : A High Performance Spatial Query System for Analytical Medical Imaging with MapReduce

  title={Hadoop-GIS : A High Performance Spatial Query System for Analytical Medical Imaging with MapReduce},
  author={Fusheng Wang and Ablimit Aji and Qiaoling Liu and J. Saltz},
Querying and analyzing large volumes of spatially oriented scientific data becomes increasingly important for many applications. For example, analyzing high-resolution digital pathology images through computer algorithms provides rich spatially derived information of micro-anatomic objects of human tissues. The spatial oriented information and queries at both cellular and sub-cellular scales share common characteristics of “Geographic Information System (GIS)”, and provide an effective vehicle… 
Large-Scale Spatial Data Management on Modern Parallel and Distributed Platforms
This work proposes to develop new data parallel designs for large-scale spatial data management that can better utilize modern inexpensive commodity parallel and distributed platforms, including multi-core CPUs, many-core GPUs and cluster computers, to achieve both efficiency and scalability.
Geospatial Big Data Handling with High Performance Computing: Current Approaches and Future Directions
  • Zhenlong Li
  • Computer Science
    Geotechnologies and the Environment
  • 2020
This chapter first summarizes four critical aspects for handling geospatial big data with HPC and then briefly reviews existing HPC-related platforms and tools for geosp spatial big data processing.
Towards a big data analytics platform with Hadoop/MapReduce framework using simulated patient data of a hospital system
The main components of the Healthcare Big Data Analytics (HBDA) platform that were envisioned by stakeholders and derived from research team, including metadata profiles including workflow steps carried out on a regular basis by VIHA staff only are shown.
Landscape of Big Medical Data: A Pragmatic Survey on Prioritized Tasks
In this paper, a group of life scientists, clinicians, computer scientists, and engineers sit together to discuss several fundamental issues that will help define and shape the landscape of big medical data.
A Methodology with Distributed Algorithms for Large-Scale Human Mobility Prediction
This dissertation proposes a methodology for the prediction of large-scale human mobility, especially a city level’s vehicle trajectory distribution across the road network, which quantifies the latent features of spatial environments and temporal factors through tensor factorization, given existing mobility datasets.
HDKV: supporting efficient high‐dimensional similarity search in key‐value stores
Dealing with high‐dimensional data in key‐value stores is still a big challenge, and state‐of‐the‐art solutions apply value‐based tree‐structure indexes to solve this issue.
Building database of WEBGIS for the exchange of marine data between Vietnam and ASEAN countries
  • Do Huy Cuong
  • Computer Science
    Tạp chí Khoa học và Công nghệ biển
  • 2019
The details of the system of oceanic database management and exchange, such as hardware and software, data storage, data format and data structure, data management and integration, and other issues of interface, security, standards are focused on.
XSEDE Cloud Survey Report
A National Science Foundation-sponsored cloud user survey was conducted by the XSEDE Cloud Integration Investigation Team to better understand how cloud is used across a wide variety of scientific fields and the humanities, arts, and social sciences.
Emerging trend of big data analytics in bioinformatics: a literature review
This paper provides a comprehensive summary of several data analytical techniques available for bioinformatics researchers and computer scientists.


Experiences on Processing Spatial Data with MapReduce
This work presents its experiences in applying the MapReduce model to solve two important spatial problems: (a) bulk-construction of R-Trees and (b) aerial image quality computation, which involve vector and raster data, respectively, and their results confirm the excellent scalability of the Map reduce framework in processing parallelizable problems.
A data model and database for high-resolution pathology analytical image informatics
A data model and database are designed to address the data management and query requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines on whole-slide images and tissue microarrays (TMAs).
RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems
This paper presents a big data placement structure called RCFile (Record Columnar File) and its implementation in the Hadoop system and shows the effectiveness of RCFile in satisfying the four requirements.
A comparison of join algorithms for log processing in MaPreduce
Key implementation details of a number of well-known join strategies in MapReduce are described and a comprehensive experimental comparison of these join techniques on a 100-node Hadoop cluster is presented.
Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help?
This paper develops a use case that comprises five representative queries and implements this use case in one distributed DBMS and in the Pig/Hadoop system, finding that certain representative analyses are easy to express in each engine's highlevel language and both systems provide competitive performance and improved scalability relative to current IDL-based methods.
HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
This paper explores the feasibility of building a hybrid system that takes the best features from both technologies; the prototype built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.
Data Partitioning for Parallel Spatial Join Processing
A framework combining the data-partitioning techniques used by most parallel join algorithms in relational databases and the filter-and-refine strategy for spatial operation processing is proposed for parallel spatial join processing.
Efficient processing of spatial joins using R-trees
This paper presents a first detailed study of spatial join processing using R-trees, particularly R*-tree, and presents several techniques for improving its execution time with respect to both, CPU- and I/O-time.
Partition based spatial-merge join
PBSM (Partition Based Spatial-Merge), a new algorithm for performing spatial join operation that is especially effective when neither of the inputs to the join have an index on the joining attribute, is described.
Integrating hadoop and parallel DBMs
This paper describes three efforts towards tight and efficient integration of Hadoop and Teradata EDW, where data in both systems are partitioned across multiple nodes for parallel computing, which creates integration optimization opportunities not possible for DBMSs running on a single node.