The use of visual sensing for action generation in unknown environments is an attractive option due to the great representation power of vision, but it is challenging for two reasons. The representations used in vision are often not well suitable for planning, thus requiring complex learning approaches. Furthermore, an active agent needs to make decisions on-line, without the delay of o -line processing. This paper proposes to combine monocular visual SLAM with dense visual reconstruction techniques in order to build geometrically correct three-dimensional models, which can be used for action generation, such as path or grasp planning in a robotic system. We propose a vision only monocular solution which will run on-line on commodity hardware. The problem is very challenging and currently no complete solutions exist, though similar o -line systems are quite mature.