Xue-Zhi Xiang

Learn More
—For speech recognition system, there are three kinds of result representations as one-best, N-best and Lattice. Since lattice has multi-path which can reduce the effect of recognition error rate, it is widely applied nowadays. In fact, there are amount of redundancies in lattice, which leads to the increasing of complexity of latter algorithm based on it.(More)
High-dimensional feature representations have recently been widely used for image classification, which not only induce large storage requirement and high computational complexity, but also tend to be lack of discrimination due to redundant and noisy features. In this paper, we propose a novel algorithm named supervised locality analysis (SLA) for(More)
In this paper, we proposed new framework for human action representation, which leverages the strengths of convolutional neural networks (CNNs) and the linear dynamical system (LDS) to represent both spatial and temporal structures of actions in videos. We make two principal contributions: first, we incorporate image-trained CNNs to detect action clip(More)
In this paper we propose to learn semantic kernels for scene classification. We first decompose the Object Bank representation into subspaces associated with each object, Anchor Objects are then created by clustering for each scene class separately. The Anchor Distances are computed to measure the distance between objects to scene classes. In order to take(More)
  • 1