Secondary generalization in categorization 1 Running head: SECONDARY GENERALIZATION IN CATEGORIZATION Secondary generalization in categorization: an exemplar-based account

Abstract

The parallel rule activation and rule synthesis (PRAS) model is a computational model for generalization in category learning, proposed by Vandierendonck (1995). An important concept underlying the PRAS model is the distinction between primary and secondary generalization. In Vandierendonck (1995), an empirical study is reported that provides support for the concept of secondary generalization. In this paper, we re-analyze the data reported by Vandierendonck (1995) by fitting three different variants of the Generalized Context Model (GCM) which do not rely on secondary generalization. Although some of the GCM variants outperformed the PRAS model in terms of global fit, they all have difficulty in providing a qualitatively good fit of a specific critical pattern. Secondary generalization in categorization 3 Secondary generalization in categorization: an exemplar-based account Perhaps the most important work of André Vandierendonck in the field of categorization and concept learning is his paper entitled “A parallel rule activation and rule synthesis model for generalization in category learning” (Vandierendonck, 1995). In this paper, a computational model of category learning is proposed that –certainly at its time– was unlike any other model in the categorization field. The parallel rule activation and rule synthesis (PRAS) model is a production model, similar to Anderson’s ACT model (Anderson, 1978, 1983), where information is stored in a special if-then format called production rules. However, during the late eighties and early nineties, most of the computational models (both in categorization and elsewhere) were so-called connectionist models. Indeed, in this period, several highly influential connectionist models had been published in the categorization literature (Gluck & Bower, 1988; Kruschke, 1992). There was a wide-spread sentiment during that period among many modelers (including the first author of this paper at that time) that connectionist models were the future. Other types of models, including the PRAS model, did not receive so much attention. Perhaps, this is the reason why some important ideas underlying the PRAS model have been somewhat ignored in the categorization literature. One such idea that I will focus on in this paper is the idea of secondary generalization. Briefly, secondary generalization is generalization that stems from abstract information, while primary generalization is generalization that stems from exemplar information. In Vandierendonck (1995), an empirical study was presented that supported the idea of secondary generalization. The empirical evidence was fairly convincing, and still poses a challenge for models that only rely on primary generalization. In this paper, we will re-analyze the data reported by Vandierendonck (1995). Because we will often refer to this paper and its dataset, we will refer to the Secondary generalization in categorization 4 Vandierendonck (1995) paper as the ‘PRAS paper’, and the dataset in that paper will be referred to as the ‘PRAS dataset’. The goal of this paper is to give an exemplar-based account of the results in the PRAS paper. We will push the exemplar models to the limit (and perhaps even over the limit) in an attempt to fit the PRAS data without relying (explicitly) on secondary generalization. If we succeed, the exemplar theorists may cry victory once again. If we fail, the empirical study of the PRAS paper will stand as one of a few interesting exceptions where the exemplar theory falls short, and categorization modelers should consider the implications for their models. The paper is organized as follows. First, we will briefly review exemplar and abstraction based models in categorization, including hybrid models like the PRAS model. Next, we discuss the concept of secondary generalization and describe the empirical study that was reported in the PRAS paper. We then give an overview of the exemplar models that we will fit to this dataset. Finally, we will reflect on the results of our model fitting experiment and their implications for old and new models of categorization. Secondary generalization in categorization 5 Exemplars, Abstraction and the PRAS model Individual members of a category are called exemplars. There is a strong tradition in the categorization literature which assumes that a category is simply represented by a set of exemplars which are known to belong to that category (Hayes-Roth & Hayes-Roth, 1977; Medin & Schaffer, 1978; Nosofsky, 1984; Estes, 1986). Category learning then is merely a matter of storing these exemplars if they present themselves as a member of the category (hence, implicitly assuming that we can only learn from ‘labeled’ exemplars). Many different flavors of exemplar theory have been proposed, but importantly, at the heart of every model based on exemplar theory is the idea that no abstraction takes place during category learning. Categorizing a new target stimulus is solely based on the set of stored exemplars. A different perspective is taken by so-called abstraction models. In these models, category level information is inferred from the observed exemplars by some sort of mechanism for abstraction. For example, in prototype models, a category is represented by a single prototype, a special (possibly unobserved) exemplar that captures the central tendency of the individual exemplars that belong to that category (Homa, 1984; Reed, 1972; Posner & Keele, 1968; Minda & Smith, 2001; Smith & Minda, 2002). After learning, the prototype has replaced the individual exemplars and forms the only basis for categorizing future stimuli. A second type of abstraction are rules, which can often be verbally expressed (Trabasso & Bower, 1968; Bourne, 1982). Rules can be one dimensional, as for example “all red objects belong to category A”, but multidimensional rules can be constructed as well. In a rule-based model of categorization, the rules have replaced the exemplars, and the categorization of new stimuli is solely based on these rules. During learning, new rules can be constructed, and existing rules can be adapted, capturing the common features of exemplars belonging to the same category. Secondary generalization in categorization 6 Inevitably, hybrid models have been proposed where the representation of a category can consist of both exemplar information and abstract information. Models that combine exemplars and prototypes have been proposed by Medin, Altom, and Murphy (1984) and Busemeyer, Dewey, and Medin (1984). Hybrid models involving both exemplars and rules have been proposed by Erickson and Kruschke (1998) and Nosofsky, Palmeri, and McKinley (1994), among others. The PRAS model of Vandierendonck (1995) is also a hybrid model. In the PRAS model, both exemplar-based and rule-based information can be used to represent a single category. Both types of information are stored by means of production rules (Anderson, 1983). In the context of categorization, production rules can be considered as if-then statements where the if-part contains a description of an exemplar or exemplar features, and where the then-part implies a category assignment. For example, a production rule for classifying animals as birds or non-birds could be: IF the animal has feathers AND the animal has wings THEN classify it as a bird The beauty of a production system is that the condition part (the if-part) can contain either a highly specific description of a unique exemplar, or it can describe a set of features that apply to a larger set of exemplars (for example “has wings”). By combining different types of information for the condition part, a production system is an ideal environment for building a hybrid model of categorization, where both exemplar-level and more abstract information can be stored in a similar representational format. The representation of a single category may consist of many (possibly conflicting) production rules, at different levels of abstraction. When a target stimulus must be classified, all these production rules may become activated (hence the name parallel rule activation), albeit with different strengths. Each production rule provides evidence for a certain Secondary generalization in categorization 7 category. When a category assigment must be made, the production system collects the accumulated evidence for each of the competing categories. Finally, a decision rule converts the evidence for the different categories into a category decision, often in a probabilistic manner. A vital feature of production systems is that they are capable of making inferences based on experience. Borrowing the example used in Vandierendonck (1995): if a production system learns that a specific brown animal is a horse, and it learns that a specific black animal is also a horse, it may infer that a horse can have any color. Combining information of two production rules into a new production rule is called rule synthesis. Inferring that a horse can have any color is of course a rather crude (over)generalization. In the PRAS model, a more subtle type of generalization is used. For example, after observing both a black and a brown horse, the PRAS model would typically infer that the color of a horse is somewhere between brown and black. The exact range of the generalization is governed by a free parameter in the model (i.e. the ρ parameter, see page 445 in the PRAS paper). This type of generalization is not confined to a single dimension. If several dimensions are involved, the PRAS model assumes that a rectangular area in the psychological space is constructed in between the exemplars over which the generalization takes place (see Figure 1 in the PRAS paper). To make things more concrete, suppose that our exemplars vary in only two dimensions (as will be the case in the empirical example below). A typical production rule in the PRAS model has the following form: IF x ∈ [a1.min, a1.max] AND x ∈ [a2.min, a2.max] THEN classify x in category A where x is a new target stimulus, and a1.min and a1.max are the lower and upper end of a range in the psychological space along the first dimension. Note that if the values of a1.min Secondary generalization in categorization 8 and a1.max are minus and plus infinity respectively, the condition is always fulfilled. On the other hand, if am.min = am.max for every dimension m, the range is confined to a single point in the psychological space. This is how exemplars are represented in the PRAS model. The example illustrates an important feature of the PRAS model: unlike other hybrid models that combine exemplar and rule information, there is no separate system or submodel for the exemplar part and the abstracted (rule) part of the system. Instead, the representation of exemplar information and abstracted information forms a continuum. On this continuum, examplars are represented by zero-range condition parts. By widening the ranges over one or more dimensions, more general (and hence more abstract) information is gradually formed, all within the same representational format. Primary and Secondary generalization and the PRAS dataset Once the PRAS model has been trained to categorize a set of training exemplars, how does the model proceed to categorize a new target stimulus? Suppose for simplicity that the representation of a category currently consists of a single production rule where the condition part corresponds to a specific examplar: IF x ∈ [6, 6] (first dimension) AND x ∈ [5, 5] (second dimension) THEN classify x in category A where the coordinates (6, 5) correspond with the location of a stored exemplar in a two-dimensional psychological space. What happens if a new stimulus with coordinates, say, (5, 4) is presented to the system? The coordinates do not perfectly match the stored exemplar in the production rule. However, the coordinates are fairly close together, and therefore, the target stimulus and the stored exemplar are perceived to be rather similar. A fundamental observation in the (category) learning literature is that similar stimuli lead to similar responses. This is known as generalization and its properties have been studied Secondary generalization in categorization 9 extensively in the classical conditioning literature (Mostofsky, 1965; Ghirlanda & Enquist, 2003). In our example, generalization would suggest that a stimulus with coordinates (5, 4) might still trigger the response part of the production rule. But how do we define ‘similar’? It seems natural that stimuli that are further apart in the psychological space are less similar than stimuli that are closer together. Shepard (1957) suggested that similarity between two exemplars i and j is an exponential decay function of their psychological distance: ηij = exp(−c ∗ dij) where c determines the steepness of the exponential curve. This relationship has been empirically observed in so many different studies that the relationship was coined the universal law of generalization (Shepard, 1987). The distance measure in this formula is often defined by the weighted Minkowski distance:

Extracted Key Phrases

5 Figures and Tables

Cite this paper

@inproceedings{Rosseel2010SecondaryGI, title={Secondary generalization in categorization 1 Running head: SECONDARY GENERALIZATION IN CATEGORIZATION Secondary generalization in categorization: an exemplar-based account}, author={Yves Rosseel and Maarten De Schryver and Henri Dunantlaan and Andr{\'e} Vandierendonck}, year={2010} }