Learn More
Action classification in still images is an important task in computer vision. It is challenging as the appearances of actions may vary depending on their context (e.g. associated objects). Manually labeling of context information would be time consuming and difficult to scale up. To address this challenge, we propose a method to automatically discover and(More)
Temporal Action Proposal (TAP) generation is an important problem, as fast and accurate extraction of semantically important (e.g. human actions) segments from untrimmed videos is an important step for large-scale video analysis. We propose a novel Temporal Unit Regression Network (TURN) model. There are two salient aspects of TURN: (1) TURN jointly(More)
To improve the positioning accuracy of implants in Total Hip Replacement (THR) surgeries, a visual-aided wireless monitoring system for THR surgery is proposed in this paper. This system aims to measure and display the contact distribution and relative pose between femoral head and acetabulum prosthesis during the surgery to help surgeons obtain accurate(More)
Malposition of the acetabular and femoral component has long been recognized as an important cause of dislocation after total hip replacement (THR) surgeries. In order to help surgeons improve the positioning accuracy of the components, a visual-aided system for THR surgeries that could estimate orientation and depth of femoral component is proposed. The(More)
This paper focuses on temporal localization of actions in untrimmed videos. Existing methods typically train classifiers for a pre-defined list of actions and apply them in a sliding window fashion. However, activities in the wild consist of a wide combination of actors, actions and objects; it is difficult to design a proper activity list that meets users’(More)
Numerous factors influence the rate of dislocation after total hip replacement (THR) surgeries and malposition of the acetabular and femoral component has long been recognized as an important cause. To help surgeons improve the accuracy of the positioning of the components, a computer-assisted system for THR surgeries that estimates and displays the(More)
Given an image and a natural language query phrase, a grounding system localizes the mentioned objects in the image according to the query's specifications. State-of-the-art methods address the problem by ranking a set of proposal bounding boxes according to the query's semantics, which makes them dependent on the performance of proposal generation systems.(More)
Action classification in still images has been a popular research topic in computer vision. Labelling large scale datasets for action classification requires tremendous manual work, which is hard to scale up. Besides, the action categories in such datasets are pre-defined and vocabularies are fixed. However humans may describe the same action with different(More)