Peng Wang

Learn More
Many computer vision problems can be formulated as binary quadratic programs (BQPs). Two classic relaxation methods are widely used for solving BQPs, namely, spectral methods and semidefinite programming (SDP), each with their own advantages and disadvantages. Spectral relaxation is simple and easy to implement, but its bound is loose. Semidefinite(More)
We describe a method for visual question answering which is capable of reasoning about contents of an image on the basis of information extracted from a large-scale knowledge base. The method not only answers natural language questions using concepts not contained in the image, but can provide an explanation of the reasoning by which it developed its(More)
Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires reasoning over visual elements of the image and general knowledge to infer the correct answer. In the first part of this(More)
Conditional random fields (CRFs) have been one of the most successful approaches to semantic pixel labelling, which solves the problem as maximum a posteriori (MAP) estimation. Standard CRFs typically contain unary potentials defined on local features and edge potentials defined on 4-or 8-neighbouring pixels. Although these CRF models have achieved(More)
Visual Question Answering (VQA) has attracted a lot of attention in both Computer Vision and Natural Language Processing communities, not least because it offers insight into the relationships between two important sources of information. Current datasets, and the models built upon them, have focused on questions which are answerable by direct analysis of(More)
—In computer vision, many problems can be formulated as binary quadratic programs (BQPs), which are in general NP hard. Finding a solution when the problem is of large size to be of practical interest typically requires relaxation. Semidefinite relaxation usually yields tight bounds, but its computational complexity is high. In this work, we present a new(More)
—Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, FVC implementations employ the Gaussian mixture model (GMM) to depict the generation process of local features. However, the representive power of the GMM could(More)
In this work, we study the challenging problem of identifying the irregular status of objects from images in an " open world " setting, that is, distinguishing the irregular status of an object category from its regular status as well as objects from other categories in the absence of " irregular object " training data. To address this problem, we propose a(More)
Compared to other applications in computer vision, convolutional neural networks have under-performed on pedestrian detection. A breakthrough was made very recently by using sophisticated deep CNN models , with a number of hand-crafted features [1], or explicit occlusion handling mechanism [2]. In this work, we show that by re-using the convolutional(More)