N. Siddharth

Learn More
We present a system that demonstrates how the composi-tional structure of events, in concert with the compositional structure of language, can interplay with the underlying fo-cusing mechanisms in video action recognition, providing a medium for top-down and bottom-up integration as well as multi-modal integration between vision and language. We show how(More)
We present an approach to simultaneously reasoning about a video clip and an entire natural-language sentence. The compositional nature of language is exploited to construct models which represent the meanings of entire sentences composed out of the meanings of the words in those sentences mediated by a grammar that encodes the predicate-argument relations.(More)
How does the human brain represent simple compositions of objects, actors, and actions? We had subjects view action sequence videos during neuroimaging (fMRI) sessions and identified lexical descriptions of those videos by decoding (SVM) the brain representations based only on their fMRI activation patterns. As a precursor to this result, we had(More)
We present an approach to searching large video corpora for clips which depict a natural-language query in the form of a sentence. Compositional semantics is used to encode subtle meaning differences lost in other approaches, such as the difference between two sentences which have identical words but entirely different meaning: The person rode the horse(More)
Many practical techniques for probabilistic inference require a sequence of distributions that interpolate between a tractable distribution and an intractable distribution of interest. Usually, the sequences used are simple, e.g., based on geometric averages between distributions. When models are expressed as prob-abilistic programs, the models themselves(More)
  • 1