This paper proposes a framework for learning human-provided category labels that describe individual objects, pairwise object relationships, as well as groups of objects. The framework was evaluated using an experiment in which the robot interactively explored 36 objects that varied by color, weight, and contents. The proposed method allowed the robot not only to learn categories describing individual objects, but also to learn categories describing pairs and groups of objects with high recognition accuracy. Furthermore, by grounding the category representations in its own sensorimotor repertoire, the robot was able to estimate how similar two categories are in terms of the behaviors and sensory modalities that are used to recognize them. Finally, this grounded measure of similarity enabled the robot to boost its recognition performance when learning a new category by relating it to a set of familiar categories.