Improved phase vocoder time-scale modification of audio

  title={Improved phase vocoder time-scale modification of audio},
  author={Jean Laroche and Mark Dolson},
  journal={IEEE Trans. Speech Audio Process.},
The phase vocoder is a well established tool for time scaling and pitch shifting speech and audio signals via modification of their short-time Fourier transforms (STFTs. [] Key Result Moreover, the modified phase vocoder is shown to provide a factor-of-two decrease in computational cost.

Figures from this paper

New phase-vocoder techniques for real-time pitch shifting

The phase-vocoder is a well-established tool for time-scaling and pitch shifting speech and audio signals. Its theory is now well understood and improvements have been proposed to reduce artifacts

Phase-vocoder: about this phasiness business

  • J. LarocheM. Dolson
  • Physics
    Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics
  • 1997
The problem of phasiness in the context of time-scale modification of signals, and two new phase synchronization schemes which are shown to both significantly improve the sound quality, and reduce the computational cost of such modifications are examined.

Mel-scale sub-band modelling for perceptually improved time-scale modification of speech and audio signals

This work proposes application of time-varying sinusoidal modeling for TSM, without any quasi-stationary assumption, which gives improved quality in comparison to waveform synchronous OLA, phase vocoder with identity phase locking, and the recently proposed harmonic-percussive separation (HPS) based TSM methods.

An Efficient Phasiness Reduction Technique for Moderate Audio Time-scale Modification

Phase vocoder approaches to time-scale modification of audio introduce a reverberant/phasy artifact into the time-scaled output due to a loss in phase coherence between short-time Fourier transform


A new time-scaling algorithm for polyphonic audio signals is described, which uses a multi-scale Gabor analysis for lowfrequency content and a vocoder with phase-locking on transients for the residual signal and for high-frequency content.


This paper presents a new adaptive tiling technique of the time-frequency plane that is suitable for a wide range of audio transformations. The proposed algorithm separates components of the audio

Application of the phase vocoder to pitch-preserving synchronization of an audio stream to an external clock

  • R. SussmanJ. Laroche
  • Physics
    Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452)
  • 1999
This paper addresses theoretical and practical issues related to pitch-preserving synchronization of an audio track and techniques to allow freezing time in the phase-vocoder and avoid problems associated with very large factor modifications.

Time-Scale Modification of Audio Signals Using Enhanced WSOLA With Management of Transients

The study shows that the accurate detection of PSTs within the WSOLA framework makes it possible to achieve a higher quality of time-scaled music, as confirmed by subjective listening tests.

Time-scale Modification using the Phase Vocoder

The phase vocoder has been used as a time-scale-modification tool for several decades. Applying large positive modification factors to different kinds of sounds (time-stretching), the result will

Time/pitch modification using narrowband am-FM signals

A new phase vocoder is proposed, which adopts an FIR narrowband filter bank for analysis, and an AM-FM signal for the synthesis of each band for time-scale modification and pitch modification.

Non-parametric techniques for pitch-scale and time-scale modification of speech

Time-scale modification of speech using an incremental time-frequency approach with waveform structure compensation

  • Benoit SylvestreP. Kabal
  • Physics
    [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 1992
A simpler version of this TSM algorithm, based on the short-time Fourier transform, is proposed for processing speech, where incremental estimators eliminate the need for explicit linear time-scaling operations.

Phase-locked vocoder

  • M. Puckette
  • Computer Science
    Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics
  • 1995
An improved formulation of the phase vocoder is proposed for which the first difficulty does not arise; and a means of phase-locking adjacent channels of the resynthesis is proposed which alleviates the second one.

An odd-DFT based approach to time-scale expansion of audio signals

A new time-scale expansion algorithm based on a frequency-scale modification approach combined with time interpolation is presented, the most critical of which concern phase and frequency estimation beyond the frequency resolution of the filterbank.

The Phase Vocoder: A Tutorial

This article attempts to explain the operation of the phase vocoder in terms accessible to musicians, relying heavily on the familiar concepts of sine waves, filters, and additive synthesis, and employing a minimum of mathematics.

Shape invariant time-scale and pitch modification of speech

A time-scale modification system that preserves shape-invariant joint time- scale and pitch modification during voicing is developed using a version of the sinusoidal analysis-synthesis system that models and independently modifies the phase contributions of the vocal tract and vocal cord excitation.

High quality time-scale modification for speech

  • Salim RoukosA. Wilgus
  • Computer Science, Physics
    ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing
  • 1985
A new and simple method for speech rate modification that yields high quality rate-modified speech and both objective and informal subjective results for the new and previous TSM methods are presented.

A subband approach to time-scale expansion of complex acoustic signals

A new approach to time-scale expansion of short-duration complex acoustic signals is introduced. Using a subband signal representation, channel phases are selected to preserve a desired time-scaled

Speech analysis/Synthesis based on a sinusoidal representation

A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves, which forms the basis for new approaches to the problems of speech transformations including time-scale and pitch-scale modification, and midrate speech coding.

Time-scale modification of speech based on short-time Fourier analysis

This paper develops the theoretical basis for time-scale modification of speech based on short-time Fourier analysis. The goal is the development of a high-quality system for changing the apparent