Audio Future Block Prediction with Conditional Generative Adversarial Network

  title={Audio Future Block Prediction with Conditional Generative Adversarial Network},
  author={Md. Rahat-uz-Zaman and Shadmaan Hye and Mahmudul Hasan},
  journal={2019 3rd International Conference on Electrical, Computer \& Telecommunication Engineering (ICECTE)},
Signal processing is a vast subfield of electrical and computer science where audio signal processing has secured a remarkable position to restore corrupted or missing audio blocks. However, generating possible future audio block from the previous audio block is still a new idea that can help to reduce both audio noise and partially missing an audio segment. In this paper, a generative adversarial network (GAN) along with a pipeline is proposed for the prediction of possible audio after an… 

Figures and Tables from this paper


SDC-Net: Video Prediction Using Spatially-Displaced Convolution
SDC module for video frame prediction with spatially-displaced convolution inherits the merits of both vector-based and kernel-based approaches, while ameliorating their respective disadvantages.
Image-to-Image Translation with Conditional Adversarial Networks
Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Generative Adversarial Nets
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
This work introduces a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrates that they are a strong candidate for unsupervised learning.
Predicting the perception of performed dynamics in music audio with ensemble learning.
Feature extraction methods were developed to capture relevant attributes related to spectral characteristics and spectral fluctuations, the latter through a sectional spectral flux, which highlighted the importance of source separation in the feature extraction.
Empirical Evaluation of Rectified Activations in Convolutional Network
The experiments suggest that incorporating a non-zero slope for negative part in rectified activation units could consistently improve the results, and are negative on the common belief that sparsity is the key of good performance in ReLU.
YouTube-8M: A Large-Scale Video Classification Benchmark
YouTube-8M is introduced, the largest multi-label video classification dataset, composed of ~8 million videos (500K hours of video), annotated with a vocabulary of 4800 visual entities, and various (modest) classification models are trained on the dataset.
Audio Set: An ontology and human-labeled dataset for audio events
The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.
Learning Deconvolution Network for Semantic Segmentation
A novel semantic segmentation algorithm by learning a deep deconvolution network on top of the convolutional layers adopted from VGG 16-layer net, which demonstrates outstanding performance in PASCAL VOC 2012 dataset.
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.