Learn More
Apache Cassandra is a leading distributed database of choice when it comes to big data management with zero downtime, linear scalability, and seamless multiple data center deployment. With increasingly wider adoption of Cassandra for online transaction processing by hundreds of Web-scale companies, there is a growing need for a rigorous and practical data(More)
Scientific workflows have become an important paradigm for domain scientists to formalize and structure complex data-intensive scientific processes. The ever-increasing volumes of scientific data motivate researchers to extend scientific workflow management systems (SWFMSs) to utilize the power of Cloud computing to perform big data analyses. Unlike(More)
When designing scientific workflows, users often face the so-called shimming problem when connecting two related but incompatible components. The problem is addressed by inserting a special kind of adaptors, called shims, that perform appropriate data transformations to resolve data type inconsistencies. However, existing shimming techniques provide limited(More)
a r t i c l e i n f o a b s t r a c t Provenance has become increasingly important in scientific workflows to understand, verify, and reproduce the result of scientific data analysis. Most existing systems store provenance data in provenance stores with proprietary provenance data models and conduct query processing over the physical provenance storages(More)
In this new era of Big Data, there is a growing need to enable scientific workflows to perform computations at a scale far exceeding a single workstation's capabilities. When running such data intensive workflows in the cloud distributed across several physical locations, the execution time and the resource utilization efficiency highly depends on the(More)
—Provenance, which records the history of an in-silico experiment, has been identified as an important requirement for scientific workflows to support scientific discovery reproducibility, result interpretation, and problem diagnosis. Large provenance datasets are composed of many smaller provenance graphs, each of which corresponds to a single workflow(More)
Geosciences Web portals are becoming increasingly important for supporting geoscientists in their research. The GEO-SEED portal is a repository of geosciences web services metadata, represented in Resource Description Framework (RDF), which supports management and discovery by machines and automated agents. This project uses SPARQL, the W3C standard for(More)
There is an increasing demand for data-intensive applications in which scientists use scientific workflows to integrate together data management, analysis, simulation and visualization services over often voluminous complex and distributed scientific data and services. One major limitation of current scientific workflow models is that each workflow task is(More)