• Corpus ID: 37109233

CP-JKU SUBMISSIONS FOR DCASE-2016 : A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS

@inproceedings{Eghbalzadeh2016CPJKUSF,
  title={CP-JKU SUBMISSIONS FOR DCASE-2016 : A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS},
  author={Hamid Eghbal-zadeh and Bernhard Lehner and Matthias Dorfer and Gerhard Widmer},
  year={2016}
}
This report describes the 4 submissions for Task 1 (Audio scene classification) of the DCASE-2016 challenge of the CP-JKU team. We propose 4 different approaches for Audio Scene Classification (ASC). First, we propose a novel i-vector extraction scheme for ASC using both left and right audio channels. Second, we propose a Deep Convolutional Neural Network (DCNN) architecture trained on spectrograms of audio excerpts in end-to-end fashion. Third, we use a calibration transformation to improve… 

Figures and Tables from this paper

FOR DCASE 2017 : ACOUSTIC SCENE CLASSIFICATION USING DEEP RESIDUAL CONVOLUTIONAL NEURAL NETWORKS
TLDR
A modified deep residual architecture trained on log-mel spectrogram patches in an end-to-end fashion for acoustic scene classification is proposed and it is suggested that the size of the dataset for Task 1 is relatively small for deep networks to significantly outperform shallower ones.
CP-JKU SUBMISSIONS TO DCASE ’ 19 : ACOUSTIC SCENE CLASSIFICATION AND AUDIO TAGGING WITH RECEPTIVE-FIELD-REGULARIZED CNNS Technical Report
TLDR
The focus in this year’s CP-JKU submissions is to provide the best-performing single-model submission, using the proposed approaches to cope with the complexities of each task.
CLASSIFYING SHORT ACOUSTIC SCENES WITH I-VECTORS AND CNNS : CHALLENGES AND OPTIMISATIONS FOR THE 2017 DCASE ASC TASK
TLDR
The result of the CP-JKU team’s experiments is a classification system that achieves classification accuracies of around 90% on the provided development data, as estimated via the prescribed four-fold cross-validation scheme.
Convolutional Neural Networks and x-vector Embedding for DCASE2018 Acoustic Scene Classification Challenge
TLDR
The Brno University of Technology (BUT) team submissions for Task 1 (Acoustic Scene Classification, ASC) of the DCASE-2018 challenge are described and the proposed approach is a fusion of two different Convolutional Neural Network topologies.
THE SEIE-SCUT SYSTEMS FOR CHALLENGE ON DCASE 2018 : DEEP LEARNING TECHNIQUES FOR AUDIO REPRESENTATION AND CLASSIFICATION
TLDR
Evaluated on the development datasets of DCASE 2018, the systems presented are superior to the corresponding baselines for tasks 1b and 1a.
ACOUSTIC SCENE CLASSIFICATION WITH FULLY CONVOLUTIONAL NEURAL NETWORKS AND I-VECTORS Technical Report
TLDR
This technical report describes the CP-JKU team’s submissions for Task 1 Subtask A (Acoustic Scene Classification, ASC) of the DCASE-2018 challenge and achieves classification accuracies of around 80% on the public Kaggle-Leaderboard.
THE SEIE-SCUT SYSTEMS FOR IEEE AASP CHALLENGE ON DCASE 2017 : DEEP LEARNING TECHNIQUES FOR AUDIO REPRESENTATION AND CLASSIFICATION
TLDR
Evaluated on the development datasets of DCASE 2017, the systems are superior to the corresponding baselines for tasks 1 and 2, and the system for task 3 performs as good as the baseline in terms of the predominant metrics.
CIAIC-ASC SYSTEM FOR DCASE 2019 CHALLENGE TASK1 Technical Report
TLDR
This report presents the systems for the subtask A and subtask B of the DCASE 2019 Task1, i.e. acoustic scene classification, and introduces a Domain Adaptation Neural Network to extract domain-unrelated features and further aggregated the DANN with the CNN models for better performance.
CIAIC-MODA SYSTEM FOR DCASE 2018 CHALLENGE TASK 5 Technical Report
TLDR
Several systems for the task 5 of Detection and Classification of Acoustic Scenes and Events 2018 (DCASE2018) challenge are presented and a fusion on posteriors from three subsystems to further improve the performances are made.
DCASE 2018 Challenge baseline with convolutional neural networks
TLDR
Python implementation of DCASE 2018 has five tasks: 1) Acoustic scene classification, 2) General-purpose audio tagging, 3) Bird audio detection, 4) Weakly-labeled semi-supervised sound event detection and 5) Multi-channel audio tagging; the baseline source code contains the implementation of convolutional neural networks, including AlexNetish and VGGish -- networks originating from computer vision.
...
...

References

SHOWING 1-10 OF 11 REFERENCES
AN I-VECTOR BASED APPROACH FOR AUDIO SCENE DETECTION
TLDR
The i-vector system is state-ofthe-art in Speaker Verification and Scene Detection, and is outperforming conventional Gaussian Mixture Model (GMM)-based approaches, and compensates for undesired acoustic variability and extracts information from the acoustic environment, making it a meaningful choice for detection on UGC.
I-Vectors for Timbre-Based Music Similarity and Music Artist Classification
TLDR
A novel approach to extract songlevel descriptors built from frame-level timbral features such as Mel-frequency cepstral coefficient (MFCC) and identity vectors or i-vectors, which are the results of a factor analysis procedure applied on framelevel features.
Front-End Factor Analysis For Speaker Verification
  • Florin Curelaru
  • Computer Science
    2018 International Conference on Communications (COMM)
  • 2018
TLDR
This paper investigates which configuration and which parameters lead to the best performance of an i-vectors/PLDA based speaker verification system and presents at the end some preliminary experiments in which the utterances comprised in the CSTR VCTK corpus were used besides utterances from MIT-MDSVC for training the total variability covariance matrix and the underlying PLDA matrices.
Within-class covariance normalization for SVM-based speaker recognition
TLDR
A practical procedure for applying WCCN to an SVM-based speaker recognition system where the input feature vectors reside in a high-dimensional space and achieves improvements of up to 22% in EER and 28% in minimum decision cost function (DCF) over the previous baseline.
Towards Light-Weight, Real-Time-Capable Singing Voice Detection
TLDR
It is shown that singing voice detection – the problem of identifying those parts of a polyphonic audio recording where one or several persons sing(s) – can be realised with substantially fewer features than used in current state-of-the-art methods.
Cosine Similarity Scoring without Score Normalization Techniques
TLDR
This paper introduces a modification to the cosine similarity that does not require explicit score normalization, relying instead on simple mean and covariance statistics from a collection of impostor speaker ivectors to enable application of a new unsupervised speaker adaptation technique to models defined in the ivector space.
Language Recognition via i-vectors and Dimensionality Reduction
TLDR
A new language identification system is presented based on the total variability approach previously developed in the field of speaker identification and developed results in excellent performance on the 2009 LRE evaluation set without the need for any post-processing or backend techniques.
Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms
We give a full account of the algorithms needed to carry out a joint factor analysis of speaker and session variability in a training set in which each speaker is recorded over many different
Fisher discriminant analysis with kernels
TLDR
A non-linear classification technique based on Fisher's discriminant which allows the efficient computation of Fisher discriminant in feature space and large scale simulations demonstrate the competitiveness of this approach.
Voicebox: Speech Processing Toolbox for Matlab
  • Website, 1999, available online at http://www.ee.ic.ac. uk/hp/staff/dmb/voicebox/voicebox.html; visited on November 1st, 2013.
  • 1999
...
...