Towards scalable activity recognition: adapting zero-effort crowdsourced acoustic models

  title={Towards scalable activity recognition: adapting zero-effort crowdsourced acoustic models},
  author={Long-Van Nguyen-Dinh and Ulf Blanke and Gerhard Tr{\"o}ster},
  journal={Proceedings of the 12th International Conference on Mobile and Ubiquitous Multimedia},
Human activity recognition systems traditionally require a manual annotation of massive training data, which is laborious and non-scalable. An alternative approach is mining existing online crowd-sourced repositories for open-ended, free annotated training data. However, differences across data sources or in observed contexts prevent a crowd-sourced based model reaching user-dependent recognition rates. To enhance the use of crowd-sourced data in activity recognition, we take an essential step… 

Figures and Tables from this paper

Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos
A framework for audio-based activity recognition that can make use of millions of embedding features from public online video sound clips is proposed, based on the combination of oversampling and deep learning approaches, that does not require further feature processing or outliers filtering.
A crowdsourcing approach for personalization in human activities recognition
This work proposes a “crowdsourcing” method for building personalized models for HAR by combining the advantages of both user-dependent and general models, finding class similarities between the target user and the community users, and showed that the personalized models outperformed the user- dependent and user-independent models when labeled data is scarce.
Robust online gesture recognition with crowdsourced annotations
SegmentedLCSS and WarpingLCSS are presented, two template-matching methods offering robustness when trained with noisy crowdsourced annotations to spot gestures from wearable motion sensors, and to use their methods to filter out the noise in the crowdsourced annotation before training a traditional classifier.
Towards a unified system for multimodal activity spotting: challenges and a proposal
A unified system which works with any available wearable sensors placed on user's body to spot activities and is compatible with respect to modalities and body-worn positions is proposed.
Using unlabeled acoustic data with locality-constrained linear coding for energy-related activity recognition in buildings
The proposed method applies the locality-constrained linear coding to process the labeled and unlabeled samples in order to achieve an acceptable classification accuracy as compared with traditional supervised learning approaches that purely rely on the large number of expensive annotations.
Supporting One-Time Point Annotations for Gesture Recognition
A new annotation technique that reduces significantly the amount of time to annotate training data for gesture recognition, and a novel BoundarySearch algorithm to find automatically the correct temporal boundaries of gestures by discovering data patterns around their given one-time point annotations.
Wearable Activity Recognition with Crowdsourced Annotation
xiii Zusammenfassung xvii


Combining crowd-generated media and personal data: semi-supervised learning for context recognition
This work uses a semi-supervised Gaussian mixture model to combine labeled data from the crowd-generated database and unlabeled personal recording data to train a personalized model for context recognition of users' mobile phones.
Exploring semi-supervised and active learning for activity recognition
Two different techniques to significantly reduce the required amount of labeled training data are explored and systematically analyzes based on semi-supervised learning and active learning and uses self-training and co-training.
Recognizing Daily Life Context Using Web-Collected Audio Data
Crowd-sourced textual descriptions related to individual sound samples were used in a configurable recognition system to model 23 sound context categories to model daily life contexts from web-collected audio data.
Semi-supervised learning helps in sound event classification
  • Zixing Zhang, Björn Schuller
  • Computer Science
    2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2012
Adding unlabelled sound event data to the training set based on sufficient classifier confidence level after its automatic labelling level can significantly enhance classification performance, and combined with optimal re-sampling of originally labelled instances and iteratively learning in semi-supervised manner can reach approximately half the one achieved by using the originally manually labelled data.
Activity Recognition from User-Annotated Acceleration Data
This is the first work to investigate performance of recognition algorithms with multiple, wire-free accelerometers on 20 activities using datasets annotated by the subjects themselves, and suggests that multiple accelerometers aid in recognition.
SoundSense: scalable sound sensing for people-centric applications on mobile phones
This paper proposes SoundSense, a scalable framework for modeling sound events on mobile phones that represents the first general purpose sound sensing system specifically designed to work on resource limited phones and demonstrates that SoundSense is capable of recognizing meaningful sound events that occur in users' everyday lives.
Active Learning Literature Survey
This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.
Audio-based context recognition
This paper investigates the feasibility of an audio-based context recognition system developed and compared to the accuracy of human listeners in the same task, with particular emphasis on the computational complexity of the methods.
Mining models of human activities from the web
A new class of sensors, based on Radio Frequency Identification (RFID) tags, can directly yield semantic terms that describe the state of the physical world, and is shown how to mine definitions of activities in an unsupervised manner from the web.
Daily Routine Recognition through Activity Spotting
The number of required low- level activities is surprisingly low, thus, enabling efficient algorithms for daily routine recognition through low-level activity spotting through the JointBoosting-framework.