The number and arrangement of semantic tags must be constrained, lest the size and complexity of the tagging sets (tagsets) used for semantic annotation become unwieldy both for humans and computers. The description of lexical predicates within the framework of frame semantics provides a natural method for selecting and structuring appropriate tagsets. 1 M o t i v a t i o n The research present here is to be conducted under the FrameNet research product at the University of California. 1 On this project our primary aim is to produce frame-semantic descriptions of lexical items; our concern with semantically tagged corpora is at both ends of our research. That is, we expect to use partially semantically tagged corpora in the investigation stage--perhaps nothing more than having WordNet hypernyms associated with nouns--but we will produce semantically tagged corpus lines as a by-product of our work. Most major grammatical theories now accept the general principle that some set of semantic roles ("case roles", "thematic roles", or "theta roles") is necessary for characterizing the semantic relations that a predicate can have to its arguments. This would seem to be one obvious starting-point for choosing a tag set for semantically annotating corpora, but there is no agreement as to the size of the minimal necessary set of "universal" roles. Also, when we examine particular semantic fields, it is obvious that each field brings to mind a new set of more specific roles. In fact, the more closely we look at individual predicates, the more specific the argument roles become, creating the specter of trying to define an unlimited number of very fine-grained tags and attributes. An adequate account of the syntax and semantics of a language will inevitably involve a fairly detailed set of semantic tags, but how can we find the right level of 9ranularity of tags for each semantic area? Consider the sentence: (1) The waters of the spa cure arthritis. A semantic annotation of the constituents must identify at least • the action or state associated with the verb, possibly expressed in terms of primitives or some kind of metalanguage; • the participants (normally expressed as arguments); and • the roles of the participants in the action or state. A basic parse will identify the sentence's syntactic constituents; from the point of view of the head verb cure, then, a semantic annotation should reveal the mapping between the syntactic constituents and the frame-semantic elements they instantiate. In sentence (1) above, for example, the grammatical subject "the waters of the spa" corresponds to the thematic ca~er of the curing effect on the entity expressed as "arthritis", the verb's syntactic direct object and its thematic patient. 2 However, there is something incomplete about such an analysis: it fails to anchor the arguments of 2Here we use the word patient (in italics) as the name of a case role; we will also use the word in the medical sense later in this paper. Caveat lector/ 1The work is housed in the International Computer Science Institute in Berkeley and funded by the National Science Foundation under NSF grant IRI 96-18838. The official name of the project is "Tools for lexicon building"; the PI is Charles J. Fillmore. Starting date March 1, 1997.