Serena Yeung

Learn More
In this work we introduce a fully end-to-end approach for action detection in videos that learns to directly predict the temporal bounds of actions. Our intuition is that the process of detecting actions is naturally one of observation and refinement: observing moments in video, and refining hypotheses about when an action is occurring. Based on this(More)
Every moment counts in action recognition. A comprehensive understanding of human activity in video requires labeling every frame according to the actions occurring, placing multiple labels densely over a video sequence. To study this problem we extend the existing THUMOS dataset and introduce MultiTHUMOS, a new dataset of dense labels over unconstrained(More)
In our project, we present a method for offline classification of transportation modes from an iPhone accelerometer. Our research may be useful for accurately predicting travel times, where automatically detecting transport mode can be used as an input into a particle filter, or for automatically classifying and logging physical activity over the course of(More)
We propose a viewpoint invariant model for 3D human pose estimation from a single depth image. To achieve this, our discriminative model embeds local regions into a learned viewpoint invariant feature space. Formulated as a multi-task learning problem, our model is able to selectively predict partial poses in the presence of noise and occlusion. Our(More)
Fine-grained recognition refers to the task in computer vision of automatically differentiating similar object categories from one another, e.g. species of birds, types of cars, breeds of dogs, or varieties of aircraft. Since this is a task that the majority of humans are untrained in, any progress has the promise of augmenting human vision. Applications(More)
Variational autoencoders (VAE) are directed generative models that learn factorial latent variables. As noted by Burda et al. (2015), these models exhibit the problem of factor overpruning where a significant number of stochastic factors fail to learn anything and become inactive. This can limit their modeling power and their ability to learn diverse and(More)
Recent progress in developing cost-effective depth sensors has enabled new AIassisted solutions such as assisted driving vehicles and smart spaces. Machine learning techniques have been successfully applied on these depth signals to perceive meaningful information about human behavior. In this work, we propose to deploy depth sensors in hospital settings(More)