Learn More
Object-based cloud storage has been widely adopted for their agility in deploying storage with a very low up-front cost. However, enterprises currently use them to store secondary data and not for expensive primary data. The driving reason is performance; most enterprises conclude that storing primary data in the cloud will not deliver the performance(More)
We introduce <i>split-level I/O scheduling</i>, a new framework that splits I/O scheduling logic across handlers at three layers of the storage stack: block, system call, and page cache. We demonstrate that traditional block-level I/O schedulers are unable to meet throughput, latency, and isolation goals. By utilizing the split-level framework, we build a(More)
The progress of a big data job is often a function of storage, networking and processing. Hence, for efficient job execution, it is important to collectively optimize all three components. Prior proposals [1], in contrast, have focused on mainly on one or two of the three components. This narrow focus constraints the extent to which these proposals can(More)
Storage systems exhibit silent data corruptions that go unnoticed until too late, potenially resulting in whole trees of lost data. To deal with this, we’ve integrated a checksumming mechanism into Linux’s Multi-Device Software RAID layer so that we are able to detect and correct these silent data corruptions. The analysis of our naive implementation shows(More)
TOWARDS RELIABLE CLOUD SYSTEMS Thanh D. Do Although providing tremendous access to data and computing power of thousands of commodity servers, large-scale cloud systems must address a new challenge: they must detect and recover from a growing number of failures, in both hardware and software components. The growing complexity of technology scaling,(More)
NOTE: Due to a number of reasons we have abandoned the pursuit of handling this within the filesystem layer. Instead, we’ve decided to adapt Linux’s MD software RAID implementation to include checksums and to reuse the techniques previously developed in [2] in order to speed up recovery of the RAID to a consistent state following a crash. The reasons for(More)
We present NICE, a key-value storage system design that leverages new software-defined network capabilities to build cluster-based network-efficient storage system. NICE presents novel techniques to co-design network routing and multicast with storage replication, consistency, and load balancing to achieve higher efficiency, performance, and scalability. We(More)
  • 1