A Sentence Is Worth a Thousand Pixels

We are interested in holistic scene understanding where images are accompanied with text in the form of complex sentential descriptions. We propose a holistic conditional random field model for semantic parsing which reasons jointly about which objects are present in the scene, their spatial extent as well as semantic segmentation, and employs text as well… CONTINUE READING

12 Figures & Tables



Citations per Year

Citation Velocity: 7

Averaging 7 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.