Andrew Tjang

Learn More
In this paper, we propose a management framework for protecting large computer systems against operator mistakes. By detecting and confining mistakes to isolated portions of the managed system, our framework facilitates correct operation even by inexperienced operators. We built a prototype management system called Barricade based on our framework. We(More)
— In this work we present Active Tapes, a bus organization for sensor networks. We construct simple cost models to compare active tapes against more traditional wireless sensor networks. Our models show that density, lifetime, and power consumption play significant roles in determining overall deployment and maintenance cost. We then characterize regimes(More)
Operator mistakes have been identified as a significant source of unavailability in Internet services. In our previous work, we proposed operator action validation as an approach for detecting mistakes while hiding them from the service and its users. Unfortunately, previous validation strategies have limitations, including the need for known instances of(More)
Online services are rapidly becoming the supporting infrastructure for numerous users' work and leisure, placing higher demands on their availability and correct functioning. Increasingly, these services are comprised of complex conglomerates of distributed hardware and software components. Added to this complexity, these services evolve quite frequently(More)
Operator mistakes have been identified as a significant source of unavailability in Internet services. In this paper, we propose a new language, A, for service engineers to write assertions about expected behaviors, proper configurations, and proper structural characteristics. This formalized specification of correct behavior can be used to bolster system(More)
— Distributed system fault detection and analysis has, until recently, focused on building passive monitoring tools without any system level knowledge. It is only now that new approaches , such as instrumenting the software used in the system to collect data about request path, have been used. In this paper we present a new monitoring and fault detection(More)
  • 1