Hadoop-GIS : A High Performance Spatial Query System for Analytical Medical Imaging with MapReduce
@inproceedings{Wang2012HadoopGISA, title={Hadoop-GIS : A High Performance Spatial Query System for Analytical Medical Imaging with MapReduce}, author={Fusheng Wang and Ablimit Aji and Qiaoling Liu and J. Saltz}, year={2012} }
Querying and analyzing large volumes of spatially oriented scientific data becomes increasingly important for many applications. For example, analyzing high-resolution digital pathology images through computer algorithms provides rich spatially derived information of micro-anatomic objects of human tissues. The spatial oriented information and queries at both cellular and sub-cellular scales share common characteristics of “Geographic Information System (GIS)”, and provide an effective vehicle…
Figures from this paper
9 Citations
Large-Scale Spatial Data Management on Modern Parallel and Distributed Platforms
- Computer Science
- 2016
This work proposes to develop new data parallel designs for large-scale spatial data management that can better utilize modern inexpensive commodity parallel and distributed platforms, including multi-core CPUs, many-core GPUs and cluster computers, to achieve both efficiency and scalability.
Geospatial Big Data Handling with High Performance Computing: Current Approaches and Future Directions
- Computer ScienceGeotechnologies and the Environment
- 2020
This chapter first summarizes four critical aspects for handling geospatial big data with HPC and then briefly reviews existing HPC-related platforms and tools for geosp spatial big data processing.
Towards a big data analytics platform with Hadoop/MapReduce framework using simulated patient data of a hospital system
- Computer Science, Medicine
- 2016
The main components of the Healthcare Big Data Analytics (HBDA) platform that were envisioned by stakeholders and derived from research team, including metadata profiles including workflow steps carried out on a regular basis by VIHA staff only are shown.
Landscape of Big Medical Data: A Pragmatic Survey on Prioritized Tasks
- Computer Science, MedicineIEEE Access
- 2019
In this paper, a group of life scientists, clinicians, computer scientists, and engineers sit together to discuss several fundamental issues that will help define and shape the landscape of big medical data.
A Methodology with Distributed Algorithms for Large-Scale Human Mobility Prediction
- Computer Science
- 2018
This dissertation proposes a methodology for the prediction of large-scale human mobility, especially a city level’s vehicle trajectory distribution across the road network, which quantifies the latent features of spatial environments and temporal factors through tensor factorization, given existing mobility datasets.
HDKV: supporting efficient high‐dimensional similarity search in key‐value stores
- Computer ScienceConcurr. Comput. Pract. Exp.
- 2013
Dealing with high‐dimensional data in key‐value stores is still a big challenge, and state‐of‐the‐art solutions apply value‐based tree‐structure indexes to solve this issue.
Building database of WEBGIS for the exchange of marine data between Vietnam and ASEAN countries
- Computer ScienceTạp chí Khoa học và Công nghệ biển
- 2019
The details of the system of oceanic database management and exchange, such as hardware and software, data storage, data format and data structure, data management and integration, and other issues of interface, security, standards are focused on.
XSEDE Cloud Survey Report
- Computer Science, Environmental Science
- 2013
A National Science Foundation-sponsored cloud user survey was conducted by the XSEDE Cloud Integration Investigation Team to better understand how cloud is used across a wide variety of scientific fields and the humanities, arts, and social sciences.
Emerging trend of big data analytics in bioinformatics: a literature review
- Computer ScienceInt. J. Bioinform. Res. Appl.
- 2018
This paper provides a comprehensive summary of several data analytical techniques available for bioinformatics researchers and computer scientists.
References
SHOWING 1-10 OF 32 REFERENCES
Experiences on Processing Spatial Data with MapReduce
- Computer ScienceSSDBM
- 2009
This work presents its experiences in applying the MapReduce model to solve two important spatial problems: (a) bulk-construction of R-Trees and (b) aerial image quality computation, which involve vector and raster data, respectively, and their results confirm the excellent scalability of the Map reduce framework in processing parallelizable problems.
A data model and database for high-resolution pathology analytical image informatics
- Computer ScienceJournal of pathology informatics
- 2011
A data model and database are designed to address the data management and query requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines on whole-slide images and tissue microarrays (TMAs).
RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems
- Computer Science2011 IEEE 27th International Conference on Data Engineering
- 2011
This paper presents a big data placement structure called RCFile (Record Columnar File) and its implementation in the Hadoop system and shows the effectiveness of RCFile in satisfying the four requirements.
A comparison of join algorithms for log processing in MaPreduce
- Computer ScienceSIGMOD Conference
- 2010
Key implementation details of a number of well-known join strategies in MapReduce are described and a comprehensive experimental comparison of these join techniques on a 100-node Hadoop cluster is presented.
Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help?
- Computer Science2009 IEEE International Conference on Cluster Computing and Workshops
- 2009
This paper develops a use case that comprises five representative queries and implements this use case in one distributed DBMS and in the Pig/Hadoop system, finding that certain representative analyses are easy to express in each engine's highlevel language and both systems provide competitive performance and improved scalability relative to current IDL-based methods.
HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
- Computer ScienceProc. VLDB Endow.
- 2009
This paper explores the feasibility of building a hybrid system that takes the best features from both technologies; the prototype built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.
Data Partitioning for Parallel Spatial Join Processing
- Computer ScienceGeoInformatica
- 1997
A framework combining the data-partitioning techniques used by most parallel join algorithms in relational databases and the filter-and-refine strategy for spatial operation processing is proposed for parallel spatial join processing.
Efficient processing of spatial joins using R-trees
- Computer ScienceSIGMOD '93
- 1993
This paper presents a first detailed study of spatial join processing using R-trees, particularly R*-tree, and presents several techniques for improving its execution time with respect to both, CPU- and I/O-time.
Partition based spatial-merge join
- Computer ScienceSIGMOD '96
- 1996
PBSM (Partition Based Spatial-Merge), a new algorithm for performing spatial join operation that is especially effective when neither of the inputs to the join have an index on the joining attribute, is described.
Integrating hadoop and parallel DBMs
- Computer ScienceSIGMOD Conference
- 2010
This paper describes three efforts towards tight and efficient integration of Hadoop and Teradata EDW, where data in both systems are partitioned across multiple nodes for parallel computing, which creates integration optimization opportunities not possible for DBMSs running on a single node.