Guided weak supervision for action recognition with scarce data to assess skills of children with autism

  title={Guided weak supervision for action recognition with scarce data to assess skills of children with autism},
  author={Prashant Pandey and P. PrathoshA. and Manu Kohli and Joshua K. Pritchard},
Diagnostic and intervention methodologies for skill assessment of autism typically requires a clinician repetitively initiating several stimuli and recording the child's response. In this paper, we propose to automate the response measurement through video recording of the scene following the use of Deep Neural models for human action recognition from videos. However, supervised learning of neural networks demand large amounts of annotated data that are hard to come by. This issue is addressed… 
Unsupervised Domain Adaptation for Semantic Segmentation of NIR Images Through Generative Latent Search
The existence of 'nearest-clone' is proved and a method to find it through an optimization algorithm over the latent space of a Deep generative model based on variational inference is proposed.
Analysis of Facial Information for Healthcare Applications: A Survey on Computer Vision-Based Approaches
An overview of the cutting-edge approaches that perform facial cue analysis in the healthcare area is given and a research taxonomy is introduced by dividing the face in its main features: eyes, mouth, muscles, skin, and shape.
AI-Augmented Behavior Analysis for Children With Developmental Disabilities: Building Toward Precision Treatment
An AI-augmented learning and applied behavior analytics (AI-ABA) platform to provide personalized treatment and learning plans to AUIDD individuals and can promote self-regulative behavior using reinforcement-based augmented or virtual reality and other mobile platforms is presented.
Improving the Movement Synchrony Estimation with Action Quality Assessment in Children Play Therapy
The findings of the experiments indicated that the framework can accurately quantify movement synchronization and assess the quality of children’s actions in play therapy and the uncertainty-preserved annotation approach produced a comparable outcome to standard methods at a far reduced cost, demonstrating its efficacy in mitigating biases.
Skin Segmentation from NIR Images using Unsupervised Domain Adaptation through Generative Latent Search
The existence of 'nearest-clone' is proved and a method to find it through an optimization algorithm over the latent space of a Deep generative model based on variational inference is proposed.


Learning Visual Attention to Identify People with Autism Spectrum Disorder
  • M. Jiang, Qi Zhao
  • Computer Science
    2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
This work differentiates itself with three unique features: first, the proposed approach is data-driven and free of assumptions, important for new discoveries in understanding ASD as well as other neurodevelopmental disorders.
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident.
Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition
The primary empirical finding is that pre-training at a very large scale (over 65 million videos), despite on noisy social-media videos and hashtags, substantially improves the state-of-the-art on three challenging public action recognition datasets.
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.2% on HMDB-51 and 97.9% on UCF-101 after pre-training on Kinetics, and a new Two-Stream Inflated 3D Conv net that is based on 2D ConvNet inflation is introduced.
One-Shot Action Localization by Learning Sequence Matching Network
This work conceptualizes a new example-based action detection problem where only a few examples are provided, and the goal is to find the occurrences of these examples in an untrimmed video sequence and introduces a novel one-shot action localization method that alleviates the need for large amounts of training samples.
Two-Stream Convolutional Networks for Action Recognition in Videos
This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.
3D Human Sensing, Action and Emotion Recognition in Robot Assisted Therapy of Children with Autism
How state-of-the-art 3d human pose reconstruction methods perform on the newly introduced action and emotion recognition tasks defined on non-staged videos, recorded during robot-assisted therapy sessions of children with autism are investigated.
Decoding Children's Social Behavior
A new publicly-available dataset containing over 160 sessions of a 3-5 minute child-adult interaction designed to elicit a broad range of social behaviors based on video and audio data is introduced.
Out-Of-Distribution Detection for Generalized Zero-Shot Action Recognition
This paper is the first to propose an out-of-distribution detector based GZSL framework for action recognition in videos that determines whether the video features belong to a seen or unseen action category and outperforms the baseline.
Objects2action: Classifying and Localizing Actions without Any Video Example
Objects2action is a semantic word embedding that is spanned by a skip-gram model of thousands of object categories that proposes a mechanism to exploit multiple-word descriptions of actions and objects and demonstrates how to extend the zero-shot approach to the spatio-temporal localization of actions in video.