Dmitry N. Zotkin

Learn More
It is often advantageous to track objects in a scene using multimodal information when such information is available. We use audio as a complementary modality to video data, which, in comparison to vision, can provide faster localization over a wider field of view. We present a particle-filter based tracking framework for performing multimodal sensor fusion(More)
Determining the occurrence of an event is fundamental to developing systems that can observe and react to them. Often, this determination is based on collecting video and/or audio data and determining the state or location of a tracked object. We use Bayesian inference and the particle filter for tracking moving objects, using both video data obtained from(More)
Accurate and fast localization of multiple speech sound sources is a problem that is of significant interest in applications such as conferencing systems. Recently, approaches that are based on search for local peaks of the steered response power are becoming popular, despite their known computational expense. Based on the observation that the wavelengths(More)
By the Helmholtz reciprocity principle, the head-related transfer function (HRTF) is equivalent to an acoustic field created by a transmitter placed at the ear location. Therefore, it can be represented as a spherical harmonics spectrum - a weighted sum of spherical harmonics. Such representations are useful in theoretical and computational analysis. Many(More)
High-quality virtual audio scene rendering is required for emerging virtual and augmented reality applications, perceptual user interfaces, and sonification of data. We describe algorithms for creation of virtual auditory spaces by rendering cues that arise from anatomical scattering, environmental scattering, and dynamical effects. We use a novel way of(More)
Individualized head related transfer functions (HRTFs) are needed for accurate rendering of spatial audio, which is important in many applications. Since these are relatively tedious to acquire, they may not be acceptable for some applications. A number of studies have sought to perform simple customization of the HRTF. We propose and test a strategy for(More)
A theory and a system for capturing an audio scene and then rendering it remotely are developed and presented. The sound capture is performed with a spherical microphone array. The sound field at the location of the array is deduced from the captured sound and is represented using either spherical wave-functions or plane-wave expansions. The sound field(More)
The head related transfer function (HRTF) characterizes the scattering properties of a person's anatomy (especially the pinnae, head and torso), and exhibits considerable person-to-person variability. It is usually measured as a part of a tedious experiment, and this leads to the function being sampled at a few angular locations. When the HRTF is needed at(More)
The Head Related Transfer Function (HRTF) characterizes the scattering properties of a person’s anatomy (especially the pinnae, head and torso), and exhibits considerable person-to-person variability. It is usually measured as a part of a tedious experiment, and this leads to the function being sampled at a few angular locations. When the HRTF is needed at(More)