Learn More
Visual features are of vital importance for human action understanding in videos. This paper presents a new video representation, called trajectory-pooled deep-convolutional descriptor (TDD), which shares the merits of both hand-crafted features [31] and deep-learned features [24]. Specifically, we utilize deep architectures to learn discriminative(More)
Reprogramming of sensor networks is an important and challenging problem as it is often desirable to reprogram the sensors in place. In this paper, we propose a multihop reprogramming service designed for Mica-2 motes. One of the problems in reprogramming is the issue of message collision. To reduce the problem of collision, we propose a sender selection(More)
Bag of Visual Words model (BoVW) with local features has become the most popular method in action recognition and obtained the state-of-the-art performance on several realistic datasets, such as the HMDB51, UCF50, and UCF101. BoVW yields a general pipeline to construct a global representation from a set of local features, which is mainly composed of five(More)
We view congestion control as a distributed primal-dual algorithm carried out by sources and links over a network to solve a global optimization problem. We describe a multi-link multi-source model of the TCP Vegas congestion control mechanism. The model provides a fundamental understanding of delay, fairness and loss properties of TCP Vegas. It implies(More)
Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident. This paper aims to discover the principles to design effective ConvNet archi-tectures for action recognition in videos and learn these models given limited(More)
Deep convolutional networks have achieved great success for object recognition in still images. However, for action recognition in videos, the improvement of deep convo-lutional networks is not so evident. We argue that there are two reasons that could probably explain this result. First the current network architectures (e.g. Two-stream ConvNets [12]) are(More)
Project ExScal (for extreme scale) fielded a 1000+ node wireless sensor network and a 200+ node peer-to-peer ad hoc network of 802.11 devices in a 13km by 300m remote area in Florida, USA during December 2004. In comparison with previous deployments, the ExScal application is relatively complex and its networks are the largest ones of either type fielded to(More)
Images and videos are often characterized by multiple types of local descriptors such as SIFT, HOG and HOF, each of which describes certain aspects of object feature. Recognition systems benefit from fusing multiple types of these descriptors. Two widely applied fusion pipelines are descriptor concatenation and kernel average. The first one is effective(More)
This paper proposes motionlet, a mid-level and spatiotemporal part, for human motion recognition. Motion let can be seen as a tight cluster in motion and appearance space, corresponding to the moving process of different body parts. We postulate three key properties of motion let for action recognition: high motion saliency, multiple scale representation,(More)