• Publications
  • Influence
TritonSort: A Balanced Large-Scale Sorting System
We present TritonSort, a highly efficient, scalable sorting system. It is designed to process large datasets, and has been evaluated against as much as 100 TB of input data spread across 832 disks in
Themis: an I/O-efficient MapReduce
This work presents Themis, a MapReduce implementation that reads and writes data records to disk exactly twice, which is the minimum amount possible for data sets that cannot fit in memory.
TritonSort: A Balanced and Energy-Efficient Large-Scale Sorting System
This article describes the hardware and software architecture necessary to operate TritonSort, a highly efficient, scalable sorting system designed to process large datasets, and is able to sort data at approximately 80% of the disks’ aggregate sequential write speed.
Local Recovery for High Availability in Strongly Consistent Cloud Services
The primary contribution of Zorfu is a local recovery technique that significantly increases availability of replicated strongly consistent services by reducing the recovery time by an order of magnitude, while imposing only a negligible latency overhead.
Improving the responsiveness of internet services with automatic cache placement
ToOL, an analysis technique that automatically optimizes cache placement, is presented, showing that near-optimal cache placements vary significantly based on input distribution.
Dissertation: I/O-Efficient Data-Intensive Computing
A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science by Alexander Carlin Rasmussen.
I/O-Efficient Data-Intensive Computing /
This dissertation endeavors to bridge the performance gap between high-performance large-scale sorting systems and the underlying capacity of the hardware infrastructure on which they are deployed by focusing on efficient I/O as a first-class architectural concern in TritonSort and Themis.
Parallelizing the Mace Model Checker
Preliminary results show that the parallelization of MaceMC, a model checker for distributed systems written in Mace, can increase the number of useful states explored by the modelChecker by a factor of 50 for relatively modest cluster sizes.