The API design and the system implementation of MXNet are described, and it is explained how embedding of both symbolic expression and tensor operation is handled in a unified fashion.
The experiments suggest that incorporating a non-zero slope for negative part in rectified activation units could consistently improve the results, and are negative on the common belief that sparsity is the key of good performance in ReLU.
The Visual Object Tracking challenge VOT2016 goes beyond its predecessors by introducing a new semi-automatic ground truth bounding box annotation methodology and extending the evaluation system with the no-reset experiment.
A hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix is proposed, which can significantly advance the state of the art.
Comparison with the state-of-the-art trackers on some challenging benchmark video sequences shows that the deep learning tracker is more accurate while maintaining low computational cost with real-time performance when the MATLAB implementation of the tracker is used with a modest graphics processing unit (GPU).
This paper proposes a simple yet powerful remedy, called Adaptive Batch Normalization (AdaBN) to increase the generalization ability of a DNN, and demonstrates that the method is complementary with other existing methods and may further improve model performance.
A simple and effective framework to learn and prune deep models in an end-to-end manner by adding sparsity regularizations on factors, and solving the optimization problem by a modified stochastic Accelerated Proximal Gradient (APG) method.
This paper proposes a geometry consistency loss for scale-consistent predictions and an induced self-discovered mask for handling moving objects and occlusions and is the first work to show that deep networks trained using unlabelled monocular videos can predict globally scale- Consistent camera trajectories over a long video sequence.
This paper proposes a novel knowledge transfer method by treating it as a distribution matching problem, which matches the distributions of neuron selectivity patterns between teacher and student networks and can significantly improve the performance of student networks.
A novel Trident Network (TridentNet) aiming to generate scale-specific feature maps with a uniform representational power is proposed and a parallel multi-branch architecture in which each branch shares the same transformation parameters but with different receptive fields is constructed.