the Vowel
Preliminaries
“Vowel […]. 1. (also vocoid) In phonetics, a segment whose articulation involves no significant obstruction of the airstream, such as [a], [ i ] or [u]. Strictly speaking, a glide such as [ j ] of [w] may also be regarded as a (brief) vowel in this sense. 2. In phonology, a segment which forms the nucleus of a syllable. 3. Any letter of the alphabet which, generally or in a particular case, represents a vowel in sense 2.” (Trask, 1996, p. 382)
“Vocoid […]. 1. A synonym for vowel in the phonetic sense of that term (sense 1), introduced in an effort to remove the ambiguity
               between the phonetic and the phonological sense of ‘vowel’. While possibly useful, the term has never become established.
               Pike (1943). 2. More narrowly, a vocoid in sense 1 which is also syllabic: a true vowel, as opposed to a glide or approximant.
               Sense 2: Laver (1994).” (Trask, 1996, p. 378) 
 
            
“Vowels and Consonants. Phonetics has traditionally classified the segments of speech into two basic varieties which are called
               vowels and consonants. Once again, there has never been a straightforward definition of these terms. Early linguists in India
               also grappled with the concepts of vowel, consonant, and syllable around 800 BC, and they recognized that the three notions
               are hopelessly intertwined […]. The definitions used here will be similar to those of the ancient Sanskrit scholars, and in
               fact, the development of modern phonetics in the West owes much to the transmission of knowledge in translation from the Sanskrit
               sources. 
 A vowel is defined as a ‘vowel-like segment’ (what Pike […] termed a vocoid) that occupies the nucleus of a syllable. A segment
               is considered to be a vocoid when its articulation permits the relatively free passage of air through the center of the mouth.
               This definition is also rather loose, but in roughly familiar terms, most segments that are at least as open as an English
               w or y-sound (the latter is transcribed [ j ] in IPA) are vocoids, all others being non-vocoids. A consonant is then defined
               simply as a non-vocoid, no matter what syllable position it occupies. This imperfect dichotomy leaves room for a middle category,
               that of the semivowel, which is defined as a vocoid located outside the nucleus of a syllable. Semivowels, in spite of being
               vocoids, are usually regarded as a special sort of consonant (often called a ‘glide’) in the interests of preserving the consonant-vowel
               dichotomy. The interplayslightly different (more acoustic) view by Orlikoff and Kahane: ‘Consonants differ from vowels primarily
               by the amount of vocal tract constriction employed in their production […] Speech can be considered to be an overlay of consonants
               on the vocal signal. The dispersion of consonants results in an amplitude modulation of the acoustic energy that, for the
               most part, gives rise to our perception of syllables.’” (Fulop, 2011, pp. 8–9) 
 
            
“The speech wave is the response of the vocal tract filter systems to one or more sound sources. This simple rule, expressed
               in the terminology of acoustic and electrical engineering, implies that the speech wave may be uniquely specified in terms
               of source and filter characteristics. In spite of the technical phrasing it is apparent that this statement also covers essentials
               of the phonetician’s concept of speech production.” (Fant, 1960, p. 15) See also Chapter M4. 
 
            
“The spectral peaks of the sound spectrum | P( f ) | are called formants. Referring to Fig. 1.1-2, it may be seen that one
               such resonance has its counterpart in a frequency region of relatively effective transmission through the vocal tract. This
               selective property of | T( f ) | is independent of the source. The frequency location of a maximum in | T( f ) |, i.e. the
               resonance frequency, is very close to the corresponding maximum in spectrum P( f ) of the complete sound. Conceptually these
               should be held apart, but in most instances resonance frequency and formant frequency may be used synonymously. Thus, for
               technical applications dealing with voiced sounds it is profitable to define formant frequency as a property of T( f ).
 The basic principle of the theory of voiced sounds is that, to a first order of approximation, the filter function is independent
               of the source. The formant peak will thus only accidentally coincide with the frequency of a harmonic. The formant frequencies
               can change only as a result of an articulatory change affecting the dimensions of the various parts of the vocal tract cavity
               system and thus the filter function. Conversely, but with the limitations implied by the concept of compensatory forms of
               articulation, the formant frequencies provide information about the position of the speaker’s articulatory organs. If these
               formant frequencies are held constant and the fundamental frequency is raised one octave, the result is ideally that twice
               as many pulses per second are emitted from the voice organs. The distance between adjacent harmonics in the spectrum will
               be doubled, and the number of harmonics up to a certain fixed frequency limit will thus be halved. If a specific formant,
               for instance the first, comes close to the 6th harmonic at the lower pitch, it will be the 3rd harmonic that comes closest
               to the same formant in the case of the higher pitch. The concepts of formant frequency and harmonic number should not be confused.”
               (Fant, 1960, p. 20)
            
See also Chapters M4 and M6.
“Usually vowels can be quite well characterized in terms of the frequencies of just the first and second formants, but the
               third formant should also be measured for high front vowels and for r-colored vowels.” (Ladefoged, 2003, p. 105) 
 
            
“The length of the pharyngeal-oral tract depends on the physical size of the speaker. The length affects the frequency locations
               of all of the vowel formants; this fact helps us to predict where the formant peaks in the spectrum will appear for men, women,
               and children. A very simple rule relates the frequencies of the formants to the overall length of the tract from glottis through
               lips. The rule for this relation is: 
 Length Rule. The average frequencies of the vowel formants are inversely proportional to the length of the pharyngeal-oral
               tract. In other words, the longer the tract, the lower are its average formant frequencies.
 The neutral vowel formants for the average man, with an oral tract 17.5 cm in length, are at 500, 1500, 2500 Hz, and so on,
               with the lowest formant at 500 Hz and frequency spacing of 1000 Hz between all formants.
 An easy way to remember the neutral formant frequencies is to think of the odd numbers 1, 3, 5, 7, 9, and so on, because
               the formant frequencies of a uniform tube that is closed at one end and open at the other, like the pharyngeal-oral tract,
               are always odd multiples of the frequency of the lowest formant. For example, begin with the basic formant frequency, 500
               Hz, as the unit or 1; then the formant frequencies above that are 500 × 3 = 1500 Hz, 500 × 5 = 2500 Hz, and so on. This method,
               calculating the formants above F1 as multiples of F1, applies only as a model of a neutral tract shape.
 The pharyngeal-oral tract length of an infant is approximately half the length of that of a man. Therefore, following our
               Length Rule about formant frequency locations, the formants of a neutral-shaped infant tract in relation to a man’s would
               be at frequency locations that are a factor of the reciprocal of ½, or twice those of the man. On this basis the infant formant
               locations for a neutral vowel would be as follows: F1 is 500 × 2 = 1000 Hz, F2 is 1500 × 2 = 3000 Hz, F3 is 2500 × 2 = 5000
               Hz, and so on.
 Following the same procedure, a woman’s vocal tract, on the average, is about 15% shorter than that of a man. The ratio corresponding
               to this amount of shortening is approximately 5/6. The reciprocal of 5/6 is 6/5, which is equal to a factor of 1.20, which,
               when multiplied by the man’s neutral formant frequencies, gives the woman’s values of 20% higher: F1 is 500 × 1.2 = 600 Hz,
               F2 is 1500 × 1.2 = 1800 Hz, F3 is 2500 × 1.2 = 3000 Hz, and so on. […]
 The Length Rule tells us approximately where we may find the formants for the very young as well as for older, larger persons.
               However, the neutral locations of F1 and F2 for an individual are also affected by the length proportions of the vocal tract
               between the oral and pharyngeal cavities (Fant, 1973, Chapter 4). In general, the location and spacing of formants F3 and
               above are more closely correlated with length of vocal tract than for F1 and F2. The average locations of F1 and F2 for an
               individual are also affected somewhat by language environment and training.” (Pickett, 1999, pp. 38–40)
            
See also Chapter M5.