Douglas Summers-Stay

Learn More
The complex compositional structure of language makes problems at the intersection of vision and language challenging. But language also provides a strong prior that can result in good superficial performance, without the underlying models truly understanding the visual content. This can hinder progress in pushing state of art in the computer vision aspects(More)
— There is good reason to believe that humans use some kind of recursive grammatical structure when we recognize and perform complex manipulation activities. We have built a system to automatically build a tree structure from observations of an actor performing such activities. The activity trees that result form a framework for search and understanding,(More)
The prospect of human commanders teaming with mobile robots " smart enough " to undertake joint exploratory tasks—especially tasks that neither commander nor robot could perform alone—requires novel methods of preparing and testing human-robot teams for these ventures prior to real-time operations. In this paper, we report work-in-progress that maintains(More)
This paper briefly sketches new work-in-progress (i) developing task-based scenarios where human-robot teams collabora-tively explore real-world environments in which the robot is immersed but the humans are not, (ii) extracting and constructing " multi-modal interval corpora " from dialog, video, and LIDAR messages that were recorded in ROS bagfiles during(More)
The inherent inflexibility and incompleteness of commonsense knowledge bases (KB) has limited their usefulness. We describe a system called Displacer for performing KB queries extended with the analogical capabilities of the word2vec distributional semantic vector space (DSVS). This allows the system to answer queries with information which was not(More)
We present a system that makes use of image context to perform pixel-level segmentation for many object classes simultaneously. The system finds approximate nearest neighbors from the training set for a (biologically plausible) feature patch surrounding each pixel. It then uses locally adaptive anisotropic Gaussian kernels to find the shape of the class(More)
  • 1