Audio-Visual Evaluation of Oratory Skills
@article{Michelson2021AudioVisualEO, title={Audio-Visual Evaluation of Oratory Skills}, author={Tzvi Michelson and Shmuel Peleg}, journal={2021 Third International Conference on Transdisciplinary AI (TransAI)}, year={2021}, pages={103-106} }
What makes a talk successful? Is it the content or the presentation? We try to estimate the contribution of the speaker’s oratory skills to the talk’s success, while ignoring the content of the talk. By oratory skills we refer to facial expressions, motions and gestures, as well as the vocal features. We use TED Talks as our dataset, and measure the success of each talk by its view count. Using this dataset we train a neural network to assess the oratory skills in a talk through three factors…
One Citation
A Peek at Peak Emotion Recognition
- Computer ScienceArXiv
- 2022
It is found that despite using very small datasets, features extracted from deep learning models can achieve results significantly better than humans in this task.
References
SHOWING 1-10 OF 27 REFERENCES
Looking to listen at the cocktail party
- Computer ScienceACM Trans. Graph.
- 2018
A deep network-based model that incorporates both visual and auditory signals to solve a single speech signal from a mixture of sounds such as other speakers and background noise, showing clear advantage over state-of-the-art audio-only speech separation in cases of mixed speech.
Online feedback system for public speakers
- Computer Science2012 IEEE Symposium on E-Learning, E-Management and E-Services
- 2012
An online feedback system for public speakers, in which emotion recognised from body language of speakers is regarded as the primary component for analysis, and a posture and gesture representation method based on Laban Movement Analysis was adopted.
3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. We also introduce…
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
- Computer ScienceINTERSPEECH
- 2020
The proposed ECAPA-TDNN architecture significantly outperforms state-of-the-art TDNN based systems on the Voxceleb test sets and the 2019 VoxCeleb Speaker Recognition Challenge.
VGGFace2: A Dataset for Recognising Faces across Pose and Age
- Computer Science2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018)
- 2018
A new large-scale face dataset named VGGFace2 is introduced, which contains 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject, and the automated and manual filtering stages to ensure a high accuracy for the images of each identity are described.
Adam: A Method for Stochastic Optimization
- Computer ScienceICLR
- 2015
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
AutoManner: An Automated Interface for Making Public Speakers Aware of Their Mannerisms
- Computer ScienceIUI
- 2016
An intelligent interface that can automatically extract human gestures using Microsoft Kinect to make speakers aware of their mannerisms is presented, using a sparsity-based algorithm, Shift Invariant Sparse Coding, to automatically extract the patterns of body movements.
Presentation Trainer, your Public Speaking Multimodal Coach
- Computer ScienceICMI
- 2015
The user experience evaluation of participants who used the Presentation Trainer to practice for an elevator pitch is presented, showing that the feedback provided by thepresentation Trainer has a significant influence on learning.
Rhema: A Real-Time In-Situ Intelligent Interface to Help People with Public Speaking
- Computer ScienceIUI
- 2015
Rhema, an intelligent user interface for Google Glass to help people with public speaking that automatically detects the speaker's volume and speaking rate in real time and provides feedback during the actual delivery of speech.
Augmenting Social Interactions: Realtime Behavioural Feedback using Social Signal Processing Techniques
- PsychologyCHI
- 2015
Logue is presented, a system that provides realtime feedback on the presenters' openness, body energy and speech rate during public speaking and analyses the user's nonverbal behaviour using social signal processing techniques and gives visual feedback on a head-mounted display.