Many systems, composed of hardware, software, and combinations thereof, function in sequential stages: each subsystem (stage) must operate correctly in order for the next to be challenged. All stages, including the interfaces between major function subsystems, are subject to design defects, which if actuated cause that stage, and hence that test, to fail. We provide models that evaluate the "testing as learning and improving" paradigm: the models describe the effect of end-to-end or linked-stage testing, and defect identification and removal, on field or delivered-system reliability. A major concern is the evaluation of operating characteristics of such test designs as the "first run of r total system successes (e.g. 3)" stopping rule. The models include Bayesian formulations in which the unknown number of defects in each subsystem at any stage during testing is a random variable with known distribution. The models and methods of this paper provide test planners with the answers to "what if questions concerning the likely future(s) of entire systems placed on test. They can be used to address test resource requirements.