the Vowel
Preliminaries
“Obviously, formant frequency is independent from the fundamental frequency […] Changes in formant frequency are due to changes
in the shape of the vocal tract cavity or cavities; changes in pitch frequency to stretching of the vocal cords. If the two
physiological events are independent, so are the acoustic results of each event […].” (Delattre, 1958/1980)
“[…] when a complex wave consists of a damped waveform repeated at regular intervals, the component frequencies will always
have the same relative amplitudes as the corresponding components in the continuous spectrum representing the isolated occurrence
of the damped wave. Consequently, altering the rate at which the vocal folds produce pulses will affect the fundamental frequency
of the complex wave; but it will not alter the formants (the peaks in the spectrum), which correspond to the basic frequencies
of the damped vibrations of the air in the vocal tract. It is in this sense that we may say that the formants of a sound are
properties of the corresponding mouth shape. […] the formants which characterize a given vowel irrespective of the rate at
which pulses are produced by the vocal cords […]
We saw in Chapter 6 that the pitch of a sound depends mainly on the fundamental frequency. Accordingly, when there is a variation
in the rate at which pulses are produced by the vocal cords, there will be a change in the pitch of the sound (although there
will be no change in the formants, and hence no change in the characteristic vowel quality). It is usually possible to alter
the pitch of a vowel sound without altering its characteristic quality, because each of these factors is controlled by a separate
physiological mechanism. As we have seen, the pitch depends on the action of the vocal cords, and the characteristic quality
depends largely on the formants, which have certain fixed values for each particular shape of the vocal tract.” (Ladefoged,
1996, pp. 98–99)
See also the citation of Hillenbrand (n.d.) in Chapter M6.
“According to the undersampling account of the effects of f0 on vowel identifiability, the sparser distribution of harmonics at high f0s yields poorer definition of the peaks and valleys in the spectral envelope, creating a more ambiguous stimulus.” (Diehl, Lindblom, Hoemeke, & Fahey, 1996)
“However, in this range of frequency (500 to 1000 Hertz), you could not tell apart different vowels anyway, because the harmonics
of the voice are so far apart that they are not ‘sampling’ the locations of the formants enough for you to tell where the
formants lie. Therefore operatic writers only put words intended to be intelligible in the lower part of a soprano’s range.”
(Moore, 2006, p. 11)
“For the U it is also by no means easy to find the pitch of the resonance by a fork, as the smallness of the opening makes
the resonance weak. Another phenomenon has guided me in this case. If I sing the scale from c upwards, uttering the vowel
U for each note, and taking care to keep the quality of the vowel correct, and not allowing it to pass into O, I feel the
agitation of the air in the mouth, and even on the drums of both ears, where it excites a tickling sensation, most powerfully
when the voice reaches f. As soon as f is passed the quality changes, the strong agitation of the air in the mouth and the
tickling in the ear cease. […] The resonance of the mouth for U is thus fixed at f with more certainty than by means of tuning
forks. But we often meet with a U of higher resonance, more resembling O, which I will represent by the French Ou. Its proper
tone may rise as high as f’.” (von Helmholtz, 1885/1954, p. 110; c = 131 Hz, f = 175 Hz, f’ = 349 Hz)
“Above f’, the characterization of U becomes imperfect even if it is closely assimilated to O. But so long as it remains the
only vowel of indeterminate sound, and the remainder allow of sensible reinforcement of their upper partials in certain regions,
this negative character will distinguish U. On the other hand a soprano voice in the neighbourhood of f’’ should not be able
to clearly distinguish U, O, A; and this agrees with my own experience.” (von Helmholtz, 1885/1954, p. 114; f’’ = 699 Hz)
“It is reasonable to assume […] that it is impossible to produce recognizable vowels at musical pitches very much higher than
their first formants. […]
The following table is offered as a practical guide: Vowels start seriously losing intelligibility when the fundamental reaches
these frequencies:
(i u y) 350 cps (roughly middle F)
(e o ø) 450 cps (roughly middle A)
(ɛ ɔ oe) 600 cps (roughly high D)
(æ ɑ a) 750 cps (roughly high G)”
(Howie & Delattre, 1962)“
[…] only very few correct identifications of isolated vowels can be expected when fundamental frequency reaches or exceeds
the usual first formant of a vowel.” (Hollien, Mendes-Schwartz, & Nielsen, 2000)
“[…] vowel identifiability is inevitably compromised once f0 exceeds R1 […]” (Joliveau, Smith, & Wolfe, 2004)
“We have seen that female singers gain considerably in sound level by abandoning the formant frequencies typical of normal
speech when they sing at high pitches. At the same time, F1 and F2 are decisive to vowel quality. This leads to the question
of how it is possible to understand the lyrics of a song when it is performed with the ‘wrong’ F1 and F2 values. Both vowel
intelligibility and syllable/text intelligibility can be expected to be disturbed. This aspect of singing has been studied
in several investigations.
As a thought-provoking reminder of the difficulties in arranging well-controlled experimental conditions in the past, an
experiment carried out by the German phonetician Carl Stumpf (1926) may be mentioned. He used three singer subjects: a professional
opera singer and two amateur singers. Each singer sang various vowels at different pitches, with their backs turned away from
a group of listeners who tried to identify the vowels. The vowels that were sung by the professional singer were easier to
identify. Also, overall, the percentages of correct identifications dropped as low as 50% for several vowels sung at the pitch
of G5 (784 Hz).
Since then, many investigations have been devoted to intelligibility of sung vowels and syllables (see, e.g. Benolken & Swanson,
1990; Gregg & Scherer, 2006; Morozov, 1965). Figure 12 gives an overview of the results in terms of the highest percentage
of correct identifications observed in various investigations for the indicated vowels at the indicated pitches. The graph
shows that vowel intelligibility is reasonably accurate up to about C5 and then quickly drops with pitch to about 15% correct
identification at the pitch of F5. The only vowel that has been observed to be correctly identified more frequently above
this pitch is /a /. Apart from pitch and register, larynx position also seems to affect vowel intelligibility (Gottfried and
Chew, 1986; Scotto di Carlo and Germain, 1985).
Smith and Scott (1980) strikingly demonstrated the significance of consonants preceding and following a vowel. This is illustrated
in the same graph. Above the pitch of F5, syllable intelligibility is clearly better than vowel intelligibility. Thus, vowels
are easier to identify when the acoustic signal contains some transitions (Andreas, 2006). Incidentally, this seems to be
a perceptual universal: changing stimuli are easier to process than are quasi-stationary stimuli.
The difficulties in identifying vowels and syllables sung at high pitches would result both from singers’ deviations from
the formant frequency patterns of normal speech and from the fact that high-pitched vowels contain few partials that are widely
distributed over the frequency scale, producing a lack of spectral information.
In addition, a third effect may contribute. Depending on phonation type, the F0 varies in amplitude. At a high pitch, F1
may lie between the first and the second partial. Sundberg and Gauffin (1982) presented synthesized, sustained vowel sounds
in the soprano range and asked subjects to identify the vowel. The results showed that an increased amplitude of the F0 was
generally interpreted as a drop in F1.” (Sundberg, 2013, pp. 86–88)
As discussed in Sections 4.1 and 4.2, prevailing theory gives reason to assume that a general but also discontinuous relationship exists between the intelligibility of vowel sounds and their fundamental frequency: accordingly, vowel sounds at lower fundamental frequencies would, as a rule, be more intelligible than vowel sounds at higher frequencies, but vowel intelligibility would also depend upon the respective relationships between fundamental frequency, harmonic spectrum and the vowel-specific formant pattern (as given in formant statistics).
Concerning the former, consider the following model cases:
Concerning the latter, consider the following model cases:
If a basic distinction is made between the resonances of the vocal tract and the formants of the vowel sound produced, strictly speaking, only resonances can be undersampled in the sense of a large frequency distance between harmonics and no harmonic matching an existing resonance frequency. Formants in their turn are always a result of a method of measurement.