Materials Part II

M4: Vowels and Fundamental Frequency

Independence of formants and fundamental frequency - “Undersampling” the formants I: formants at middle and high fundamental frequencies - “Oversinging” the first formant - “Grade” of vowels - “Undersampling” the formants II: resonances and formants

Independence of formants and fundamental frequency

“Obviously, formant frequency is independent from the fundamental frequency […] Changes in formant frequency are due to changes in the shape of the vocal tract cavity or cavities; changes in pitch frequency to stretching of the vocal cords. If the two physiological events are independent, so are the acoustic results of each event […].” (Delattre, 1958/1980)

“[…] when a complex wave consists of a damped waveform repeated at regular intervals, the component frequencies will always have the same relative amplitudes as the corresponding components in the continuous spectrum representing the isolated occurrence of the damped wave. Consequently, altering the rate at which the vocal folds produce pulses will affect the fundamental frequency of the complex wave; but it will not alter the formants (the peaks in the spectrum), which correspond to the basic frequencies of the damped vibrations of the air in the vocal tract. It is in this sense that we may say that the formants of a sound are properties of the corresponding mouth shape. […] the formants which characterize a given vowel irrespective of the rate at which pulses are produced by the vocal cords […]
We saw in Chapter 6 that the pitch of a sound depends mainly on the fundamental frequency. Accordingly, when there is a variation in the rate at which pulses are produced by the vocal cords, there will be a change in the pitch of the sound (although there will be no change in the formants, and hence no change in the characteristic vowel quality). It is usually possible to alter the pitch of a vowel sound without altering its characteristic quality, because each of these factors is controlled by a separate physiological mechanism. As we have seen, the pitch depends on the action of the vocal cords, and the characteristic quality depends largely on the formants, which have certain fixed values for each particular shape of the vocal tract.” (Ladefoged, 1996, pp. 98–99)

See also the citation of Hillenbrand (n.d.) in Chapter M6.

“Undersampling” the formants I: formants at middle and high fundamental frequencies

“According to the undersampling account of the effects of f0 on vowel identifiability, the sparser distribution of harmonics at high f0s yields poorer definition of the peaks and valleys in the spectral envelope, creating a more ambiguous stimulus.” (Diehl, Lindblom, Hoemeke, & Fahey, 1996)

“However, in this range of frequency (500 to 1000 Hertz), you could not tell apart different vowels anyway, because the harmonics of the voice are so far apart that they are not ‘sampling’ the locations of the formants enough for you to tell where the formants lie. Therefore operatic writers only put words intended to be intelligible in the lower part of a soprano’s range.” (Moore, 2006, p. 11)

“Oversinging” the first formant

“For the U it is also by no means easy to find the pitch of the resonance by a fork, as the smallness of the opening makes the resonance weak. Another phenomenon has guided me in this case. If I sing the scale from c upwards, uttering the vowel U for each note, and taking care to keep the quality of the vowel correct, and not allowing it to pass into O, I feel the agitation of the air in the mouth, and even on the drums of both ears, where it excites a tickling sensation, most powerfully when the voice reaches f. As soon as f is passed the quality changes, the strong agitation of the air in the mouth and the tickling in the ear cease. […] The resonance of the mouth for U is thus fixed at f with more certainty than by means of tuning forks. But we often meet with a U of higher resonance, more resembling O, which I will represent by the French Ou. Its proper tone may rise as high as f’.” (von Helmholtz, 1885/1954, p. 110; c = 131 Hz, f = 175 Hz, f’ = 349 Hz)

“Above f’, the characterization of U becomes imperfect even if it is closely assimilated to O. But so long as it remains the only vowel of indeterminate sound, and the remainder allow of sensible reinforcement of their upper partials in certain regions, this negative character will distinguish U. On the other hand a soprano voice in the neighbourhood of f’’ should not be able to clearly distinguish U, O, A; and this agrees with my own experience.” (von Helmholtz, 1885/1954, p. 114; f’’ = 699 Hz)

“It is reasonable to assume […] that it is impossible to produce recognizable vowels at musical pitches very much higher than their first formants. […]
The following table is offered as a practical guide: Vowels start seriously losing intelligibility when the fundamental reaches these frequencies:
(i u y) 350 cps (roughly middle F)
(e o ø) 450 cps (roughly middle A)
(ɛ ɔ oe) 600 cps (roughly high D)
(æ ɑ a) 750 cps (roughly high G)”
(Howie & Delattre, 1962)“

[…] only very few correct identifications of isolated vowels can be expected when fundamental frequency reaches or exceeds the usual first formant of a vowel.” (Hollien, Mendes-Schwartz, & Nielsen, 2000)

“[…] vowel identifiability is inevitably compromised once f0 exceeds R1 […]” (Joliveau, Smith, & Wolfe, 2004)

“We have seen that female singers gain considerably in sound level by abandoning the formant frequencies typical of normal speech when they sing at high pitches. At the same time, F1 and F2 are decisive to vowel quality. This leads to the question of how it is possible to understand the lyrics of a song when it is performed with the ‘wrong’ F1 and F2 values. Both vowel intelligibility and syllable/text intelligibility can be expected to be disturbed. This aspect of singing has been studied in several investigations.
As a thought-provoking reminder of the difficulties in arranging well-controlled experimental conditions in the past, an experiment carried out by the German phonetician Carl Stumpf (1926) may be mentioned. He used three singer subjects: a professional opera singer and two amateur singers. Each singer sang various vowels at different pitches, with their backs turned away from a group of listeners who tried to identify the vowels. The vowels that were sung by the professional singer were easier to identify. Also, overall, the percentages of correct identifications dropped as low as 50% for several vowels sung at the pitch of G5 (784 Hz).
Since then, many investigations have been devoted to intelligibility of sung vowels and syllables (see, e.g. Benolken & Swanson, 1990; Gregg & Scherer, 2006; Morozov, 1965). Figure 12 gives an overview of the results in terms of the highest percentage of correct identifications observed in various investigations for the indicated vowels at the indicated pitches. The graph shows that vowel intelligibility is reasonably accurate up to about C5 and then quickly drops with pitch to about 15% correct identification at the pitch of F5. The only vowel that has been observed to be correctly identified more frequently above this pitch is /a /. Apart from pitch and register, larynx position also seems to affect vowel intelligibility (Gottfried and Chew, 1986; Scotto di Carlo and Germain, 1985).
Smith and Scott (1980) strikingly demonstrated the significance of consonants preceding and following a vowel. This is illustrated in the same graph. Above the pitch of F5, syllable intelligibility is clearly better than vowel intelligibility. Thus, vowels are easier to identify when the acoustic signal contains some transitions (Andreas, 2006). Incidentally, this seems to be a perceptual universal: changing stimuli are easier to process than are quasi-stationary stimuli.
The difficulties in identifying vowels and syllables sung at high pitches would result both from singers’ deviations from the formant frequency patterns of normal speech and from the fact that high-pitched vowels contain few partials that are widely distributed over the frequency scale, producing a lack of spectral information.
In addition, a third effect may contribute. Depending on phonation type, the F0 varies in amplitude. At a high pitch, F1 may lie between the first and the second partial. Sundberg and Gauffin (1982) presented synthesized, sustained vowel sounds in the soprano range and asked subjects to identify the vowel. The results showed that an increased amplitude of the F0 was generally interpreted as a drop in F1.” (Sundberg, 2013, pp. 86–88)

“Grade” of vowels

As discussed in Sections 4.1 and 4.2, prevailing theory gives reason to assume that a general but also discontinuous relationship exists between the intelligibility of vowel sounds and their fundamental frequency: accordingly, vowel sounds at lower fundamental frequencies would, as a rule, be more intelligible than vowel sounds at higher frequencies, but vowel intelligibility would also depend upon the respective relationships between fundamental frequency, harmonic spectrum and the vowel-specific formant pattern (as given in formant statistics).

Concerning the former, consider the following model cases:

Comparison of two sounds of /ɛ / produced by a woman at F0 of 200 and 400 Hz, related to a common formant pattern F1–F2 = 600–2000 Hz (compare Section 2.2, the formant statistics for Standard German); F1 will be “undersampled” for the sound at higher F0, i.e. F1 lying in between the first and the second harmonics, whereas for the first sound, the third harmonic matches with F1 indicating a “sampled” formant pattern F1–F2 as a better condition for vowel perception.
Comparison of two sounds of /ɔ / produced by a woman at F0 of 285 and 340 Hz, related to a common formant pattern F1–F2 = 570–1140 Hz (compare Section 2.2, the formant statistics for Standard German); F1–F2 will be “undersampled” for the sound at higher F0, i.e. F1 lying in between the first and the second, and F2 lying in between the third and the fourth harmonics, while for the first sound, the second and the fourth harmonics match with F1 and F2.
And so on.

Concerning the latter, consider the following model cases:

Comparison of two sounds of / i / produced by a woman at F0 of 200 and 300 Hz, related to a common formant pattern F1–F2 = 300–2700 Hz (compare Section 2.1, the formant statistics of Peterson and Barney, 1952); F1 and F2 will be “undersampled” for the sound at lower F0, with F1 lying in between the first and the second, and F2 lying in between the twelfth and the thirteenth harmonics, while for the second sound, the first and the ninth harmonics match with F1 and F2 indicating a “sampled” formant pattern F1–F2 as a better condition for vowel perception.
Comparison of two sounds of /ɑ / produced by a woman at F0 of 270 and 330 Hz, related to a common formant pattern F1– F2 = 660–990 Hz (compare Section 2.1, the formant statistics of Fant, 1959); F1 and F2 will be “undersampled” for the sound at lower F0, i.e. F1 lying in between the second and the third, and F2 lying in between the third and the fourth harmonics, while for the second sound, the second and the third harmonics match with F1 and F2.
Comparison of two sounds of /u / produced by a woman at F0 of 200 and 300 Hz, related to a common formant pattern F1–F2 = 300–900 Hz; F1 and F2 will be “undersampled” for the sound at lower F0, i.e. F1 lying in between the first and the second, and F2 lying in between the fourth and the fifth harmonics, while for the second sound, the first and the third harmonics match with F1 and F2.
And so on.

“Undersampling” the formants II: resonances and formants

If a basic distinction is made between the resonances of the vocal tract and the formants of the vowel sound produced, strictly speaking, only resonances can be undersampled in the sense of a large frequency distance between harmonics and no harmonic matching an existing resonance frequency. Formants in their turn are always a result of a method of measurement.