The combination of vision and speech, together with the resulting necessity for formal representations, builds a central component of an autonomous system. A robot that is supposed to navigate autonomously through space must be able to perceive its environment as automatically as possible. But each recognition system has its own inherent limits. Especially a robot whose task is to navigate through unknown terrain has to deal with unidentified or even unknown objects, thus compounding the recognition problem still further. The system described in this paper takes this into account by trying to identify objects based on their functionality where possible. To handle cases where recognition is insufficient, we examine here two further strategies: on the one hand, the linguistic reference and labeling of the unidentified objects and, on the other hand, ontological deduction. This approach then connects the probabilistic area of object recognition with the logical area of formal reasoning. In order to support formal reasoning, additional relational scene information has to be supplied by the recognition system. Moreover, for a sound ontological basis for these reasoning tasks, it is necessary to define a domain ontology that provides for the representation of real-world objects and their corresponding spatial relations in linguistic and physical respects. Physical spatial relations and objects are measured by the visual system, whereas linguistic spatial relations and objects are required for interactions with a user.