Audible Panorama: Automatic Spatial Audio Generation for Panorama Imagery

@article{Huang2019AudiblePA,
  title={Audible Panorama: Automatic Spatial Audio Generation for Panorama Imagery},
  author={Haikun Huang and Michael Solah and Dingzeyu Li and Lap-Fai Yu},
  journal={Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems},
  year={2019}
}
As 360 deg cameras and virtual reality headsets become more popular, panorama images have become increasingly ubiquitous. While sounds are essential in delivering immersive and interactive user experiences, most panorama images, however, do not come with native audio. In this paper, we propose an automatic algorithm to augment static panorama images through realistic audio assignment. We accomplish this goal through object detection, scene classification, object depth estimation, and audio… 

Figures and Tables from this paper

A TCN-based Primary Ambient Extraction in Generating Ambisonics Audio from Panorama Video
TLDR
An end-to-end Ambisonics generator for panorama video is proposed and a Temporal Convolutional Network (TCN) based Primary Ambient Extractor (PAE) is proposed to separate the two parts of sound field.
Localize to Binauralize: Audio Spatialization from Visual Sound Source Localization
TLDR
This work designs a two-stage architecture called Localize-to-Binauralize Network (L2BNet) and uses Sound Source Localization with only audio, a proxy-task for weak supervision for binaural audio generation, to reduce the reliance on explicit supervision.
SoundsRide: Affordance-Synchronized Music Mixing for In-Car Audio Augmented Reality
TLDR
SoundsRide is proposed, an in-car audio augmented reality system that mixes music in real-time synchronized with sound affordances along the ride that can create captivating music experiences and positively as well as negatively influence subjectively perceived driving safety, depending on the mix and user.
Toward Automatic Audio Description Generation for Accessible Videos
TLDR
A system that analyzes the audiovisual contents of a video and generates the audio descriptions and provided recommendations for the development of future audio description generation technologies.
Sound Synthesis, Propagation, and Rendering: A Survey
TLDR
This paper gives a broad overview of research works on sound simulation in virtual reality, games, multimedia, computer-aided design, and points to some future directions of this field.
Scene-Aware Background Music Synthesis
TLDR
This paper introduces an interactive background music synthesis algorithm guided by visual content that can synthesize dynamic background music for different types of scenarios and conducts quantitative and qualitative analysis on the synthesized results to validate the efficacy of the approach.
Comparison of Tracking Techniques on 360-Degree Videos
TLDR
This work thoroughly evaluates the performance of eight modern trackers in terms of accuracy and speed on 360-degree videos and provides a dataset containing nine 360- degree videos with ground truth of target positions as a benchmark for future research.
Dynamic Field of View Restriction in 360° Video: Aligning Optical Flow and Visual SLAM to Mitigate VIMS
TLDR
This work presents a technique for standard 360° video that shrinks FoVs only during VIMS inducing scenes and discusses the user experience of dynamic-FoVs and recommendations for how they can help make VR comfortable and immersive for all.
Staying on Track: a Comparative Study on the Use of Optical Flow in 360° Video to Mitigate VIMS
TLDR
A novel technique dynamically controlled by a video’s precomputed optical flow and participants’ runtime head direction is described and evaluated in a within-subjects study on a 360° video of a roller coaster.
360-Degree Video Streaming: A Survey of the State of the Art
TLDR
Different projections, compression, and streaming techniques that either incorporate the visual features or spherical characteristics of 360-degree video, as well as the latest ongoing standardization efforts for enhanced degree-of-freedom immersive experience are presented.
...
...

References

SHOWING 1-10 OF 35 REFERENCES
Self-Supervised Generation of Spatial Audio for 360 Video
TLDR
This work introduces an approach to convert mono audio recorded by a 360° video camera into spatial audio, a representation of the distribution of sound over the full viewing sphere, and shows that it is possible to infer the spatial localization of sounds based only on a synchronized360° video and the mono audio track.
Scene-aware audio for 360° videos
TLDR
This work proposes a method that synthesizes the directional impulse response between any source and listening locations by combining a synthesized early reverberation part and a measured late reverberation tail, and demonstrates the strength of the method in several applications.
HindSight: Enhancing Spatial Awareness by Sonifying Detected Objects in Real-Time 360-Degree Video
TLDR
HindSight is introduced, a wearable system that increases spatial awareness by detecting relevant objects in live 360-degree video and sonifying their position and class through bone conduction headphones.
Dynamic Stereoscopic 3D Parameter Adjustment for Enhanced Depth Discrimination
TLDR
Two stereoscopic rendering techniques which actively vary the stereo parameters based on the scene content are developed and indicate that variable stereo parameters provide enhanced depth discrimination compared to static parameters and were preferred by participants over the traditional fixed parameter approach.
Shot Orientation Controls for Interactive Cinematography with 360 Video
TLDR
This work presents new interactive shot orientation techniques that are designed to help viewers see all of the important content in 360-degree video stories and provides an automatic method for determiningImportant content in existing360-degree videos.
Compensating for Distance Compression in Audiovisual Virtual Environments Using Incongruence
TLDR
Applying the proposed method for reducing crossmodal distance compression in VEs resulted in more accurate distance perception in a VE at longer range, and suggests a modification that could adaptively compensate for distance compression at both shorter and longer ranges.
Watching 360° Videos Together
TLDR
The findings indicate that while participants enjoyed the ability to view the scene independently, this caused challenges establishing joint references, leading to breakdowns in conversation.
Experimenting with Sound Immersion in an Arts and Crafts Museum
TLDR
An immersive sound system emitting audio content that takes into consideration the position of museum visitors as well as their orientation and visual vector and is executed locally and in real-time by the visitor's device.
Tell Me Where to Look: Investigating Ways for Assisting Focus in 360° Video
TLDR
Two Focus Assistance techniques were developed: Auto Pilot (directly bringing viewers to the target), and Visual Guidance (indicating the direction of the target) which showed that Focus Assistance improved ease of focus.
Hierarchical and Spatio-Temporal Sparse Representation for Human Action Recognition
TLDR
A novel two-layer video representation for human action recognition employing hierarchical group sparse encoding technique and spatio-temporal structure is presented and the superiorities of the hierarchical framework are demonstrated on several challenging datasets.
...
...