In this paper, a method for automatically building 3D virtual worlds which correspond to the the objects detected in a real environment is presented. The proposed method can be used in many applications such as for example Virtual Reality, Augmented Reality, remote inspection and Virtual Worlds generation. Our method requires an operator equipped with a stereo camera and moving in an office environment. The operator takes a picture of the environment and, with the proposed method, the Regions of Interest (ROI) are extracted from each picture, their content is classified and 3D virtual scenarios are reconstructed using icons which resemble the classified object categories. ROI extraction, pose and height estimation of the classified objects are performed using stereo vision. The ROIs are obtained using a Dempster-Shafer technique for fusing different information detected from the image such as the Speeded Up Robust Features (SURF) and depth data obtained with the stereo camera. Experimental results are presented in office environments.