the Vowel
Preliminaries
“If you know you are analyzing a low back vowel, don’t be surprised to find one thick bar on the spectrogram that really corresponds
to two formants close together below 1’000 Hz.” (Ladefoged, 2003, p. 114)
Referring to vocalisations of /ɔ / as in caught: “When the formants are close together […] neither the wide- nor the narrowband
spectrum gives a good indication of the formant frequencies. […] The first two formants appear as a single peak below 1’000
Hz. Their frequencies cannot be determined from these spectra.” (Ladefoged, 2003, pp. 119–120)
“Sometimes it is not immediately obvious whether a particularly wide band represents one formant or two. Figure 5.8 is a spectrogram
of the word bud, spoken by a female speaker of Californian English. There is a wide band below 1,000 Hz, but is this one formant
or two formants close together as in Figure 5.7? Noting that there is a clear formant at about 1,500 Hz in Figure 5.8, and
additional formants higher, we must take it that there is only a single formant below 1,000 Hz. It seems that there is some
kind of extra formant near the first formant, making this dark bar wider. From the evidence of this one vowel it is impossible
to say whether the additional energy is above or below the first formant. Further analysis of this speaker’s voice showed
that there was often energy around the 1,000 Hz region, irrespective of the vowel. This spurious formant is not connected
with the vowel quality, but is simply a characteristic of the particular speaker’s voice. This is a good example of the necessity
of looking at a representative sample of a speaker’s voice before making any measurements of the formants.” (Ladefoged, 2003,
pp. 114–115)
“Flat-spectrum stimuli, consisting of many equal‐amplitude harmonics, produce timbre sensations that can depend strongly on the phase angles of the individual harmonics. For fundamental frequencies in the human pitch range, many realizable timbres have vowel-like perceptual qualities. This observation suggests the possibility of constructing intelligible voiced speech signals that have flat-amplitude spectra.” (Schroeder & Strube, 1986)