Douglas Summers-Stay

Learn More
Problems at the intersection of vision and language are of significant importance both as challenging research questions and for the rich set of applications they enable. However, inherent structure in our world and bias in our language tend to be a simpler signal for learning than visual modalities, resulting in models that ignore visual information,(More)
The complex compositional structure of language makes problems at the intersection of vision and language challenging. But language also provides a strong prior that can result in good superficial performance, without the underlying models truly understanding the visual content. This can hinder progress in pushing state of art in the computer vision aspects(More)
There is good reason to believe that humans use some kind of recursive grammatical structure when we recognize and perform complex manipulation activities. We have built a system to automatically build a tree structure from observations of an actor performing such activities. The activity trees that result form a framework for search and understanding,(More)
The inherent inflexibility and incompleteness of commonsense knowledge bases (KB) has limited their usefulness. We describe a system called Displacer for performing KB queries extended with the analogical capabilities of the word2vec distributional semantic vector space (DSVS). This allows the system to answer queries with information which was not(More)
This paper briefly sketches new work-inprogress (i) developing task-based scenarios where human-robot teams collaboratively explore real-world environments in which the robot is immersed but the humans are not, (ii) extracting and constructing “multi-modal interval corpora” from dialog, video, and LIDAR messages that were recorded in ROS bagfiles during(More)
We present a system that makes use of image context to perform pixellevel segmentation for many object classes simultaneously. The system finds approximate nearest neighbors from the training set for a (biologically plausible) feature patch surrounding each pixel. It then uses locally adaptive anisotropic Gaussian kernels to find the shape of the class(More)
The prospect of human commanders teaming with mobile robots “smart enough” to undertake joint exploratory tasks—especially tasks that neither commander nor robot could perform alone—requires novel methods of preparing and testing human-robot teams for these ventures prior to real-time operations. In this paper, we report work-in-progress that maintains face(More)
Representing knowledge as high-dimensional vectors in a continuous semantic vector space can help overcome the brittleness and incompleteness of traditional knowledge bases. We present a method for performing deductive reasoning directly in such a vector space, combining analogy, association, and deduction in a straightforward way at each step in a chain of(More)
Integrating computer vision and natural language processing is a novel interdisciplinary field that has received a lot of attention recently. In this survey, we provide a comprehensive introduction of the integration of computer vision and natural language processing in multimedia and robotics applications with more than 200 key references. The tasks that(More)