• Publications
  • Influence
Person Transfer GAN to Bridge Domain Gap for Person Re-identification
TLDR
We present a new dataset called MSMT171 with many important features, e.g., 1) the raw videos are taken by an 15-camera network deployed in both indoor and outdoor scenes, 2) the videos cover a long period of time and present complex lighting variations, 3) it contains currently the largest number of annotated identities, i.e., 4,101 identities and 126,441 bounding boxes. Expand
  • 419
  • 133
  • PDF
Pose-Driven Deep Convolutional Model for Person Re-identification
TLDR
We propose a Pose-driven Deep Convolutional (PDC) model to learn improved feature extraction and matching models from end to end. Expand
  • 398
  • 33
  • PDF
Deep Attributes Driven Multi-Camera Person Re-identification
TLDR
The visual appearance of a person is easily affected by many factors like pose variations, viewpoint changes and camera parameter differences. Expand
  • 298
  • 19
  • PDF
RAM: A Region-Aware Deep Model for Vehicle Re-Identification
Previous works on vehicle Re-ID mainly focus on extracting global features and learning distance metrics. Because some vehicles commonly share same model and maker, it is hard to distinguish themExpand
  • 71
  • 19
  • PDF
GLAD: Global-Local-Alignment Descriptor for Pedestrian Retrieval
TLDR
The huge variance of human pose and the misalignment of detected human images significantly increase the difficulty of person Re-Identification (Re-ID). Expand
  • 208
  • 17
  • PDF
The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models
TLDR
In this paper, we propose the new fixedsize ordinally-forgetting encoding (FOFE) method, which can almost uniquely encode any variable-length sequence of words into a fixed-size representation. Expand
  • 63
  • 16
  • PDF
Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency
TLDR
We propose a novel neural network structure, namely \emph{feedforward sequential memory networks (FSMN)}, to model long-term dependency in time series without using recurrent feedback. Expand
  • 50
  • 12
  • PDF
Descriptive visual words and visual phrases for image applications
TLDR
The Bag-of-visual Words (BoW) image representation has been applied for various problems in the fields of multimedia and computer vision. Expand
  • 244
  • 10
  • PDF
Building contextual visual vocabulary for large-scale image applications
TLDR
We propose an effective visual vocabulary generation framework containing three novel contributions: 1) we propose aneffective unsupervised local feature refinement strategy; 2) we consider local features in groups to model their spatial contexts; 3) we further learn a discriminant distance metric between local feature groups, which we call discriminant group distance. Expand
  • 133
  • 8
  • PDF
Deep-FSMN for Large Vocabulary Continuous Speech Recognition
TLDR
We present an improved feedforward sequential memory networks (FSMN) architecture, namely Deep-FSMN (DFSMN), by introducing skip connections between memory blocks in adjacent layers. Expand
  • 42
  • 8
  • PDF