Most existing approaches to content-based retrieval rely on query by example or user sketch based on low-level features; these are not suitable for semantic (object level) distinctions. In other approaches, information is classified according to a predefined set of classes and classification is either performed manually or using class-specific algorithms. Most of these systems lack flexibility: the user does not have the ability to define or change the classes, and new classification schemes require implementation of new class-specific algorithms and/or the input of an expert. In this paper, we present a different approach to content-based retrieval and a novel framework for classification of visual information in which (1) users define their own visual classes and classifiers are learned automatically; and (2) multiple fuzzy-classifiers and machine learning techniques are combined for automatic classification at multiple levels (region, perceptual, object-part, object and scene). We present The Visual Apprentice, an implementation of our framework for still images and video that uses a combination of lazy-learning, decision trees, and evolution programs for classification and grouping. Our system is flexible in that models can be changed by users over time, different types of classifiers are combined, and user-model definitions can be applied to object and scene structure classification. Special emphasis is placed on the difference between semantic and visual classes, and between classification and detection. Examples and results are presented to demonstrate the applicability of our approach to perform visual classification and detection.