Digital Assistant for Sound Classification Using Spectral Fingerprinting

  title={Digital Assistant for Sound Classification Using Spectral Fingerprinting},
  author={Ria Sinha},
  journal={International Journal for Research in Applied Science and Engineering Technology},
  • Ria Sinha
  • Published 31 August 2021
  • Computer Science
  • International Journal for Research in Applied Science and Engineering Technology
Abstract: This paper describes a digital assistant designed to help hearing-impaired people sense ambient sounds. The assistant relies on obtaining audio signals from the ambient environment of a hearing-impaired person. The audio signals are analysed by a machine learning model that uses spectral signatures as features to classify audio signals into audio categories (e.g., emergency, animal sounds, etc.) and specific audio types within the categories (e.g., ambulance siren, dog barking, etc… 

Figures from this paper



Audio-based multimedia event detection using deep recurrent neural networks

This paper introduces longer-range temporal information with deep recurrent neural networks (RNNs) for both stages ofimedia event detection, and observes improvements in both frame-level and clip-level performance compared to SVM and feed-forward neural network baselines.

Audio Set: An ontology and human-labeled dataset for audio events

The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.

Environmental sound classification with convolutional neural networks

  • Karol J. Piczak
  • Computer Science
    2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP)
  • 2015
The model outperforms baseline implementations relying on mel-frequency cepstral coefficients and achieves results comparable to other state-of-the-art approaches.

Urban Sound Classification using Long Short-Term Memory Neural Network

It is shown that the LSTM model outperforms a set of existing solutions and is more accurate and confident than the baseline CNN.

librosa: Audio and Music Signal Analysis in Python

A brief overview of the librosa library's functionality is provided, along with explanations of the design goals, software development practices, and notational conventions.

Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features

Results demonstrate that the proposed method is highly effective in the classification tasks by employing multi-temporal resolution and multi-level features, and it outperforms the previous methods which only account for single- level features.

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

It is shown that the improved performance stems from the combination of a deep, high-capacity model and an augmented training set: this combination outperforms both the proposed CNN without augmentation and a “shallow” dictionary learning model with augmentation.


This paper proposes a neural network architecture for the purpose of using sequential information that is composed of two separated lower networks and one upper network and refers to these as LSTM layers, CNN layers and connected layers, respectively.

Speech recognition with deep recurrent neural networks

This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.