Computational grids provide access to distributed compute resources and distributed data resources, creating unique opportunities for improved access to information. When data repositories are accessible from any platform, applications can be developed that support nontraditional uses of computing resources. Environments thus enabled include knowledge… (More)
Traditional High Performance Computing (HPC) resources, such as those available on the TeraGrid, support batch job submissions using Distributed Resource Management Systems (DRMS) like TORQUE or the Sun Grid Engine (SGE). For large-scale data intensive computing, programming paradigms such as MapReduce are becoming popular. A growing number of codes in… (More)
Geospatial data interoperability has many facets, including: standards and specifications, infrastructure models and information integration strategies, metadata and data quality descriptions, data format and type conversion techniques, authorization, security and privacy, information assurance and business arrangements. Recent progress in all these… (More)
"Big data" has become a major force of innovation across enterprises of all sizes. New platforms with increasingly more features for managing big datasets are being announced almost on a weekly basis. Yet, there is currently a lack of any means of comparability among such platforms. While the performance of traditional database systems is well understood… (More)
We discuss issues in managing very large scientiic data collections and describe our approach at the San Diego Supercomputer Center for supporting high performance data-intensive applications. Our systems provide metadata-based access to data sets and support collections with widely varying data characteristics.
Parallel database systems are suitable for use in applications with high capacity and high performance and availability requirements. The trend in such systems is to provide eecient on-line capability for performing various system administration functions such as, index creation and maintenance, backup/restore, reorganization , and gathering of statistics.… (More)
Information based computing is the concept that sci-entiic applications should be able to do resource and information discovery in metacomputing environments and employ semantic-based access to data. The next-generation, data intensive, scientiic applications are expected to beneet from combining supercomputer-level numerical computation capabilities with… (More)
1 Background As more and more data is made available through the Web, mediation of information from heterogeneous sources becomes a crucial task for future Web information systems. We describe the features of our information mediator Vamp (Virtual Agency Mediator Prototype), which is being developed as part of a joint project between SDSC and UCSD. Like its… (More)