- Stevens, Kenneth N.
- Cambridge, MA: MIT Press. ALLARD JONGMAN
teristics of speech. Speech consists of variations in air pressure that result from physical disturbances of air molecules caused by the flow of air out of the lungs. This airflow makes the air molecules alternately crowd together and move apart (oscillate), creating increases and decreases, respectively, in air pressure. The resulting sound wave transmits these changes in pressure from speaker to hearer. Sound waves can be described in terms of physical properties such as cycle, period, frequency, and amplitude. These concepts are most easily illustrated when considering a simple wave corresponding to a pure tone. A cycle is a sequence of one increase and one decrease in air pressure. A period is the amount of time (expressed in seconds or milliseconds) that one cycle takes. Frequency is the number of cycles in one second, expressed in hertz (Hz). An increase in frequency usually results in an increase in perceived pitch. Amplitude refers to the magnitude of vibrations, with larger vibrations resulting in greater peaks of pressure (greater amplitude), which usually result in an increase in perceived loudness. Unlike pure tones, which rarely occur in the environment, speech sounds are complex waves with combinations of different frequencies and amplitudes. However, as first stated by the French mathematician Fourier (1768–1830), any complex wave can be described as a combination of simple waves. A complex wave has a regular rate of repetition, known as the fundamental frequency (F0). Changes in F0 give rise to differences in perceived pitch, whereas changes in the number of constituent simple waves and their amplitude relations result in perceived differences in timbre or quality. Fourier’s theorem enables us to describe speech sounds in terms of the frequency and amplitude of each of its constituent simple waves. Such a description is known as the spectrum of a sound. A spectrum is visually displayed as a plot of frequency vs. amplitude, with frequency represented from low to high along the horizontal axis and amplitude from low to high along the vertical axis. The usual energy source for speech is the airstream generated by the lungs. This steady flow of air is converted into brief puffs of air by the vibrating vocal folds, two muscular folds housed in the larynx. The dominant way of conceptualizing the process of speech production is in terms of the source-filter theory, according to which the acoustic characteristics of speech can be understood as a result of a source component and a filter component. The source component is determined by the rate of vocal fold vibration, which in turn is affected by a number of factors, including the rate of airflow and the mass and stiffness of the vocal folds. The rate of vocal fold vibration directly determines the F0 of the waveform. The mean F0 for adult women is approximately 220 Hz, and approximately 130 Hz for adult men. “In addition to their role as properties of individual speech sounds, F0 and amplitude also signal emphasis, stress, and intonation.” For speech, the source component itself has a complex waveform, and its spectrum will typically show the highest energy at the lowest frequencies and a number of higher frequency components that