• Corpus ID: 231592554

Piano Skills Assessment

  title={Piano Skills Assessment},
  author={Paritosh Parmar and Jaiden Reddy and Brendan Tran Morris},
Can a computer determine a piano player’s skill level? Is it preferable to base this assessment on visual analysis of the player’s performance or should we trust our ears over our eyes? Since current CNNs have difficulty processing long video videos, how can shorter clips be sampled to best reflect the players skill level? In this work, we collect and release a first-of-its-kind dataset for multimodal skill assessment focusing on assessing piano player’s skill level, answer the asked questions… 

Figures and Tables from this paper

Win-Fail Action Recognition
This work introduces a first of its kind paired win-fail action understanding dataset with samples from the following domains: “General Stunts,” “Internet Wins-Fails,’ “Trick Shots,�” & “Party Games” and systematically analyzes the characteristics of thewin-fail task/dataset with prototypical action recognition networks and a novel video retrieval task.
Assessing Physical Rehabilitation Exercises using Graph Convolutional Network with Self-supervised regularization
This work established a supervised learning method to automatically assess physical rehabilitation exercises in the home environment using computer vision using a graph convolutional network (GCN) with self-supervised regularization with state-of-the-art performance and prediction accuracy.
Learning Through Play; a Study Investigating How Effective Video Games Can Be Regarding Keyboard Education at a Beginner Level
A set of video games designed to reduce the high drop-off rates associated with learning to play the keyboard by gamifying rote tasks that require monotonous practice are described.


Observing Pianist Accuracy and Form with Computer Vision
A novel two-stream convolutional neural network that takes video and audio inputs together for detecting pressed notes and finger presses and introduces a novel finger identification solution based on pressed piano note information.
Assessing the Quality of Actions
A learning-based framework that takes steps towards assessing how well people perform actions in videos by training a regression model from spatiotemporal pose features to scores obtained from expert judges and can provide interpretable feedback on how people can improve their action.
FALCONS: FAst Learner-grader for CONtorted poses in Sports
A virtual refereeing network to evaluate the execution of a diving performance and introduces a simple yet effective module to assess the difficulty of the performance based on the extracted joints sequence.
Manipulation-Skill Assessment from Videos with Spatial Attention Network
A novel RNN-based spatial attention model is proposed that considers accumulated attention state from previous frames as well as high-level information about the progress of an undergoing task in automatic skill assessment.
S3D: Stacking Segmental P3D for Action Quality Assessment
This paper proposes the Segment-based P3D-fused network S3D built-upon ED-TCN and pushes the performance on the UNLV-Dive dataset by a significant margin and shows that temporal segmentation can be embedded with few efforts.
Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination
This paper presents a method for assessing skill from video, applicable to a variety of tasks, ranging from surgery to drawing and rolling pizza dough, and proposes a novel loss function that learns discriminative features when a pair of videos exhibit variance in skill.
The Pros and Cons: Rank-Aware Temporal Attention for Skill Determination in Long Videos
A new model to determine relative skill from long videos, through learnable temporal attention modules, using a novel rank-aware loss function that outperforms previous approaches and classic softmax attention on both datasets by over 4% pairwise accuracy, and as much as 12% on individual tasks.
Learning to Score Figure Skating Sport Videos
A deep architecture that includes two complementary components, i.e., Self-Attentive L STM and Multi-scale Convolutional Skip LSTM can efficiently learn the local and global sequential information in each video.
ScoringNet: Learning Key Fragment for Action Quality Assessment with Ranking Loss in Skilled Sports
The ScoringNet is introduced, a novel network consisting of key fragment segmentation (KFS) and score prediction (SP), to address these two problems of extracting effective features and predicting reasonable scores for a long skilled sport video.
Action Quality Assessment Using Siamese Network-Based Deep Metric Learning
This work proposes a new action scoring system termed as Reference Guided Regression (RGR), which comprises a Deep Metric Learning Module that learns similarity between any two action videos based on their ground truth scores given by the judges, and a Score Estimation Module that uses the resemblance of a video with a reference video to give the assessment score.