Using subdivision as a basic primitive for the construction of arbitrary topology, smooth, free-form surfaces is attractive for content destined for display on devices with greatly varying rendering performance. Subdivision naturally supports level of detail rendering and powerful compression algorithms. While the underlying algorithms are conceptually simple it is difficult to implement player engines which achieve optimal performance on modern CPUs such as the Intel Pentium family.In this paper we describe a novel table driven evaluation strategy for subdivision surfaces using as an example the scheme of Catmull and Clark. Cache conscious design and exploitation of SIMD instructions allows us to achieve nearly 100% FPU utilization in the inner loop and achieve a composite performance of 1.2 flop/cycle on the Intel PIII and 1.8 flop/cycle on the Intel P4 including all memory transfers. The algorithm supports tradeoffs between cache size and memory bus usage which we examine. A library which implements this engine is freely available from the authors.