Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments
The database contains labeled face photographs spanning the range of conditions typically encountered in everyday life, and exhibits “natural” variability in factors such as pose, lighting, race, accessories, occlusions, and background.
ReferItGame: Referring to Objects in Photographs of Natural Scenes
A new game to crowd-source natural language referring expressions by designing a two player game that can both collect and verify referring expressions directly within the game and provides an in depth analysis of the resulting dataset.
Modeling Context in Referring Expressions
This work focuses on incorporating better measures of visual context into referring expression models and finds that visual comparison to other objects within an image helps improve performance significantly.
MAttNet: Modular Attention Network for Referring Expression Comprehension
- Licheng Yu, Zhe L. Lin, +4 authors Tamara L. Berg
- Computer ScienceIEEE/CVF Conference on Computer Vision and…
- 24 January 2018
This work proposes to decompose expressions into three modular components related to subject appearance, location, and relationship to other objects, which allows for flexibly adapt to expressions containing different types of information in an end-to-end framework.
Two-person interaction detection using body-pose features and multiple instance learning
- K. Yun, J. Honorio, Debaleena Chattopadhyay, Tamara L. Berg, D. Samaras
- Computer ScienceIEEE Computer Society Conference on Computer…
- 16 June 2012
A complex human activity dataset depicting two person interactions, including synchronized video, depth and motion capture data is created, and techniques related to Multiple Instance Learning (MIL) are explored, finding that the MIL based classifier outperforms SVMs when the sequences extend temporally around the interaction of interest.
Im2Text: Describing Images Using 1 Million Captioned Photographs
A new objective performance measure for image captioning is introduced and methods incorporating many state of the art, but fairly noisy, estimates of image content are developed to produce even more pleasing results.
TVQA: Localized, Compositional Video Question Answering
This paper presents TVQA, a large-scale video QA dataset based on 6 popular TV shows, and provides analyses of this new dataset as well as several baselines and a multi-stream end-to-end trainable neural network framework for the TVZA task.
Parsing clothing in fashion photographs
- Kota Yamaguchi, M. Kiapour, Luis E. Ortiz, Tamara L. Berg
- Computer ScienceIEEE Conference on Computer Vision and Pattern…
- 16 June 2012
An effective method for parsing clothing in fashion photographs, an extremely challenging problem due to the large number of possible garment items, variations in configuration, garment appearance, layering, and occlusion is demonstrated.
Shape matching and object recognition using low distortion correspondences
- A. Berg, Tamara L. Berg, Jitendra Malik
- Mathematics, Computer ScienceIEEE Computer Society Conference on Computer…
- 20 June 2005
This work approaches recognition in the framework of deformable shape matching, relying on a new algorithm for finding correspondences between feature points, and shows results for localizing frontal and profile faces that are comparable to special purpose approaches tuned to faces.
Where to Buy It: Matching Street Clothing Photos in Online Shops
- M. Kiapour, Xufeng Han, S. Lazebnik, A. Berg, Tamara L. Berg
- Computer ScienceIEEE International Conference on Computer Vision…
- 7 December 2015
Three different methods for Exact Street to Shop retrieval are developed, including two deep learning baseline methods, and a method to learn a similarity measure between the street and shop domains.