Feature Pyramid Attention based Residual Neural Network for Environmental Sound Classification

@article{Zhou2022FeaturePA,
  title={Feature Pyramid Attention based Residual Neural Network for Environmental Sound Classification},
  author={Liguang Zhou and Yuhongze Zhou and Xiaonan Qi and Junjie Hu and Tin Lun Lam and Yangsheng Xu},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.14411}
}
Environmental sound classification (ESC) is a challenging problem due to the unstructured spatial-temporal relations that exist in the sound signals. Re-cently, many studies have focused on abstracting features from convolutional neural networks while the learning of semantically relevant frames of sound signals has been overlooked. To this end, we present an end-to-end framework, namely feature pyramid attention network (FPAM), focusing on abstracting the semantically relevant features for ESC… 

Figures and Tables from this paper

Attentional Graph Convolutional Network for Structure-aware Audio-Visual Scene Classification

An end-to-end framework, namely attentional graph convolutional network (AGCN), for structure- aware audio-visual scene representation and extensive experimental results show that promising results have been achieved by the AGCN methods.

References

SHOWING 1-10 OF 41 REFERENCES

Environmental Sound Classification with Parallel Temporal-Spectral Attention

A novel parallel temporal-spectral attention mechanism for CNN to learn discriminative sound representations is proposed, which enhances the temporal and spectral features by capturing the importance of different time frames and frequency bands.

Intelligent household surveillance robot

This paper presents a household surveillance robot that can detect abnormal events by utilizing video and audio information and achieves robust tracking by employing a particle filter algorithm.

A learning model for automated construction site monitoring using ambient sounds

Long-Range Hand Gesture Recognition via Attention-based SSD Network

A novel attention-based single shot multibox detector (SSD) model that incorporates both spatial and channel attention for hand gesture recognition is proposed that extends the recognition distance from 1 meter to 7 meters through the proposed model without sacrificing speed.

Environment sound classification using an attention-based residual neural network

SoHAM: A Sound-Based Human Activity Monitoring Framework for Home Service Robots

This article proposed a sound-based human activity monitoring (SoHAM) framework by recognizing sound events in a home environment by developing the method of context-aware sound event recognition (CoSER), which uses contextual information to disambiguate sound events.

AttaNet: Attention-Augmented Network for Fast and Accurate Scene Parsing

This paper proposes a new model, called Attention-Augmented Network (AttaNet), to capture both global context and multi-level semantics while keeping the efficiency high, and achieves different levels of speed/accuracy trade-offs on Cityscapes.

A Multi-Channel Temporal Attention Convolutional Neural Network Model for Environmental Sound Classification

An effective convolutional neural network structure with a multichannel temporal attention (MCTA) block, which applies a temporal attention mechanism within each channel of the embedded features to extract channel-wise relevant temporal information.

Environmental Sound Classification Using Local Binary Pattern and Audio Features Collaboration

This paper presents a new approach to classify environmental sounds using a texture feature local binary pattern (LBP) and audio features collaboration that outperform the results of methods that used handcrafted features with classical machine learning algorithms and are similar to some convolutional neural network-based methods.