Learn More
Object detectors are typically trained on a large set of still images annotated by bounding-boxes. This paper introduces an approach for learning object detectors from real-world web videos known only to contain objects of a target class. We propose a fully automatic pipeline that localizes objects in a set of videos of the class and learns a detector for(More)
We introduce a weakly supervised approach for learning human actions modeled as interactions between humans and objects. Our approach is human-centric: We first localize a human in the image and then determine the object relevant for the action and its spatial relation with the human. The model is learned automatically from a set of still images annotated(More)
We introduce an approach for learning human actions as interactions between persons and objects in realistic videos. Previous work typically represents actions with low-level features such as image gradients or optical flow. In contrast, we explicitly localize in space and track over time both the object and the person, and represent an action as the(More)
Modern Computer Vision systems learn visual concepts through examples (i.e. images) which have been manually annotated by humans. While this paradigm allowed the field to tremendously progress in the last decade, it has now become one of its major bottlenecks. Teaching a new visual concept requires an expensive human annotation effort, limiting systems to(More)
situation results/output Algorithm [a] states topology pairwise marginals convergence optimal complexity references Plain BP (m)any tree any yes yes yes O(nh 2) [4, 1] Loopy BP (m)any any any yes no no O(eh 2 i) [4, 1] truncation trick for BP (m)any tree/any truncated yes yes/no yes/no O(nhk)/O(ehik)) [7] distance transform for BP ordered [b][j] tree/any(More)
  • 1