Scattering Wavelet Hash Fingerprints for Musical Audio Recognition

  title={Scattering Wavelet Hash Fingerprints for Musical Audio Recognition},
  author={Evren Kanalici and Gokhan Bilgin},
  journal={International Journal of Innovative Technology and Exploring Engineering},
  • Evren Kanalici, G. Bilgin
  • Published 2019
  • Computer Science
  • International Journal of Innovative Technology and Exploring Engineering
Fingerprint design is the cornerstone of the audio recognition systems in which aims robustness and fast retrieval. Short-term Fourier transform and Mel-spectral representations are common for the task in mind, however these extraction methods suffer from being unstable and having limited spectral-spatial resolution. Scattering wavelet transform (SWT) provides another approach to these limitations by recovering information loss, while ensuring translation invariance and stability. We propose a… 

Figures and Tables from this paper


Computer vision for music identification
This paper focuses on the problem of music identification, where the goal is to reliably identify a song given a few seconds of noisy audio, and treats the spectrogram of each music clip as a 2D image and transforms music identification into a corrupted sub-image retrieval problem.
Simultaneous feature learning and hash coding with deep neural networks
Extensive evaluations on several benchmark image datasets show that the proposed simultaneous feature learning and hash coding pipeline brings substantial improvements over other state-of-the-art supervised or unsupervised hashing methods.
Audio Fingerprinting: Combining Computer Vision & Data Stream Processing
  • S. Baluja, Michele Covell
  • Computer Science
    2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07
  • 2007
The waveprint system, a novel system for audio identification that uses a combination of computer-vision techniques and large-scale-data-stream processing algorithms to create compact fingerprints of audio data that can be efficiently matched, is presented.
An Industrial Strength Audio Search Algorithm
The algorithm is noise and distortion resistant, computationally efficient, and massively scalable, capable of quickly identifying a short segment of music captured through a cellphone microphone in the presence of foreground voices and other dominant noise, out of a database of over a million tracks.
Musical genre classification of audio signals
The automatic classification of audio signals into an hierarchy of musical genres is explored and three feature sets for representing timbral texture, rhythmic content and pitch content are proposed.
FaceNet: A unified embedding for face recognition and clustering
A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.
A Highly Robust Audio Fingerprinting System
An audio fingerprinting system that uses the fingerprint of an unknown audio clip as a query on a fingerprint database, which contains the fingerprints of a large library of songs, the audio clip can be identified.
Deep Scattering Spectrum
A scattering transform defines a locally translation invariant representation which is stable to time-warping deformation. It extends MFCC representations by computing modulation spectrum
Wavelet-based image indexing techniques with partial sketch retrieval capability
This paper describes WBIIS (Wavelet-Based Image Indexing and Searching), a new image indexing and retrieval algorithm with partial sketch image searching capability for large image databases that performs much better in capturing coherence of image, object granularity, local color/texture, and bias avoidance than traditional color layout algorithms.
Now Playing: Continuous low-power music recognition
A low-power music recognizer that runs entirely on a mobile device and automatically recognizes music without user interaction is presented, which respects user privacy by running entirely on-device and can passively recognize a wide range of music.