• Corpus ID: 12213498

StratOS: A Big Data Framework for Scientific Computing

@article{Stickley2015StratOSAB,
  title={StratOS: A Big Data Framework for Scientific Computing},
  author={Nathaniel R. Stickley and Miguel A. Aragon-Calvo},
  journal={ArXiv},
  year={2015},
  volume={abs/1503.02233}
}
We introduce StratOS, a Big Data platform for general computing that allows a datacenter to be treated as a single computer. With StratOS, the process of writing a massively parallel program for a datacenter is no more complicated than writing a Python script for a desktop computer. Users can run pre-existing analysis software on data distributed over thousands of machines with just a few keystrokes. This greatly reduces the time required to develop distributed data analysis pipelines. The… 

Figures from this paper

References

SHOWING 1-10 OF 18 REFERENCES

The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines

  • L. BarrosoUrs Hölzle
  • Computer Science
    The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
  • 2009
TLDR
The architecture of WSCs is described, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base are described.

MapReduce: simplified data processing on large clusters

TLDR
This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing

TLDR
Resilient Distributed Datasets is presented, a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner and is implemented in a system called Spark, which is evaluated through a variety of user applications and benchmarks.

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

TLDR
The results show that Mesos can achieve near-optimal data locality when sharing the cluster among diverse frameworks, can scale to 50,000 (emulated) nodes, and is resilient to failures.

Astro-WISE: Chaining to the Universe

TLDR
First light of a very different solution to the problem initiated by a smaller astronomical IT community is reported, which provides an abstract scientific information layer which integrates distributed scientific analysis with distributed processing and federated archiving and publishing.

Astronomy in the Cloud: Using MapReduce for Image Co-Addition

TLDR
The experience of implementing a scalable image-processing pipeline for the SDSS imaging database using Hadoop is reported on, and a number of optimizations to the basic approach are described and reported, to report experimental results comparing their performance.

A bridging model for parallel computation

TLDR
The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.

Leveraging the cloud for robust and efficient lunar image processing

The Lunar Mapping and Modeling Project (LMMP) is tasked to aggregate lunar data, from the Apollo era to the latest instruments on the LRO spacecraft, into a central repository accessible by

Astro-WISE information system

TLDR
The various concepts behind the Astro-WISE, their realization and use, migration of Astro- WISE to other astronomical and non-astronomical information systems are shown.

DARK MATTER HALOS IN THE STANDARD COSMOLOGICAL MODEL: RESULTS FROM THE BOLSHOI SIMULATION

Lambda Cold Dark Matter (ΛCDM) is now the standard theory of structure formation in the universe. We present the first results from the new Bolshoi dissipationless cosmological ΛCDM simulation that