Learn More
The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neu-ral networks (DNNs). Despite this progress, building a new ASR system remains a challenging task, requiring various resources, multiple training stages and significant expertise. This paper presents our Eesen framework which drastically(More)
Human action recognition from videos is a challenging machine vision task with multiple important application domains, such as human-robot/machine interaction, interactive entertainment , multimedia information retrieval, and surveillance. In this paper, we present a novel approach to human action recognition from 3D skeleton sequences extracted from depth(More)
Creating descriptors for trajectories has many applications in robotics/human motion analysis and video copy detection. Here, we propose a novel descriptor for 2D trajectories: Histogram of Oriented Displacements (HOD). Each displacement in the trajectory votes with its length in a his-togram of orientation angles. 3D trajectories are described by the HOD(More)
The connectionist temporal classification (CTC) loss function has several interesting properties relevant for automatic speech recognition (ASR): applied on top of deep recurrent neural networks (RNNs), CTC learns the alignments between speech frames and label sequences automatically, which removes the need for pre-generated frame-level labels. CTC systems(More)
  • 1