In this paper, several criteria and paradigms are described to measure the performance of spoken language systems. The focus is on the evaluation of natural language understanding components. These evaluations are carried out in the domain of spontaneous human-human interaction as supported by automatic translation systems. They are also applied in the domain of spontaneous human-machine interaction typically used in information retrieval applications. Some system response evaluation paradigms for different applications and domains are discussed in more detail. It is also shown that official performance tests and site-specific evaluation paradigms are complementary in use.