Suli Yang

Learn More
The progress of a big data job is often a function of storage, networking and processing. Hence, for efficient job execution, it is important to collectively optimize all three components. Prior proposals [1], in contrast, have focused on mainly on one or two of the three components. This narrow focus constraints the extent to which these proposals can(More)
We introduce <i>split-level I/O scheduling</i>, a new framework that splits I/O scheduling logic across handlers at three layers of the storage stack: block, system call, and page cache. We demonstrate that traditional block-level I/O schedulers are unable to meet throughput, latency, and isolation goals. By utilizing the split-level framework, we build a(More)
Object-based cloud storage has been widely adopted for their agility in deploying storage with a very low up-front cost. However, enterprises currently use them to store secondary data and not for expensive primary data. The driving reason is performance; most enterprises conclude that storing primary data in the cloud will not deliver the performance(More)
ii iv v To my parents vi vii Acknowledgements I would first and foremost extend my wholehearted gratitude to my advisors, An-drea Arpaci-Dusseau and Remzi Arpaci-Dusseau. Andrea and Remzi are the reason that I had the opportunity for this exceptional Ph.D. journey. To this day, I still remember the moment when they took me as their student and the joy and(More)
Storage systems exhibit silent data corruptions that go unnoticed until too late, potenially resulting in whole trees of lost data. To deal with this, we've integrated a checksumming mechanism into Linux's Multi-Device Software RAID layer so that we are able to detect and correct these silent data corruptions. The analysis of our naive implementation shows(More)
NOTE: Due to a number of reasons we have abandoned the pursuit of handling this within the filesystem layer. Instead, we've decided to adapt Linux's MD software RAID implementation to include checksums and to reuse the techniques previously developed in [2] in order to speed up recovery of the RAID to a consistent state following a crash. The reasons for(More)
  • 1