Phones &

Back to list of projects
Intelligibility of high-pitched vowel sounds in the singing and speaking
of a female Cantonese Opera singer


Acoustics of stage voices feature special qualities when it comes to efficiency, intelligibility, dynamic range and projection. However, with rare exceptions, corresponding existing studies published in the literature concern "Western" speaking and singing styles. Thus, the question rises as to whether existing descriptions of the acoustic characteristics of trained stage voices have to be understood solely as style-specific descriptions–in which case a comparison of utterances of very different styles will reveal substantial differences–or whether there are characteristics that can be generalized for trained stage voices as such–then, a comparison of utterances of very different styles will provide evidence for substantial similarities.

The present study addresses this question with regard to the intelligibility of speech in high-pitched vocal productions of female opera singers, comparing the styles of "classical" Opera and Cantonese Opera.


Two different styles: "classical" singing (so-called "legit" style) is characterized (1) by a pronounced difference of speaking and singing, and (2) by a particular coloring and harmonization of the vowels (phenomenon of “vowel modification”) as well as a specific voice characteristic and timbre that allows the voice to be heard in the context of a big orchestra, a vocal strategy in which sound projection and timbre are often favored over text or vowel intelligibility. In contrast, singing in the style of Cantonese Opera (1) does not separate speaking and singing in a strict sense, and (2) there is no superordination of sound projection and timbre over text intelligibility.

Vowel intelligibility in singing: According to Sundberg (2012), vowels can be perceived and discriminated only up to a fundamental frequency of c. 525 Hz (C5), a frequency limit which, roughly spoken, includes the first vowel-specific spectral characteristic (the first "formant") for a substantial part of the vowels. Thus, it becomes understandable why, in the literature, many authors consider singing on higher pitches as unintelligible: "However, in this range of frequency (500 to 1000 Hertz), you could not tell apart different vowels anyway, because the harmonics of the voice are so far apart that they are not "sampling" the locations of the formants enough for you to tell where the formants lie. Therefore operatic writers only put words intended to be intelligible in the lower part of a soprano's range." [1]

The question: As mentioned, although such statements are made in general terms, they are based on investigations of style-specific "Western" singing, above all relating to "classical" singing. But how about other singing styles?

Project approach I: Because of the characteristics of Cantonese Opera style mentioned above, we investigate possible text intelligibility in singing and speaking relating to the performance of the famous actress MUI, SHET SZE'. We extracted single syllables and isolated vowel sounds produced at high pitches, and we presented these sounds to listeners (native speakers of Cantonese) for vowel identification.

Project approach II: We plan to extend the forms of the perceptual tests.

Project approach III: We plan to record actresses and to extend the investigation with regard to other aspects of Cantonese Opera style.

First results: A paper presenting and discussing first results (see project approach I) is currently submitted to Interspeech 2014. The study provides evidence and confirms earlier indications given in the literature that text intelli gibility in singing and speaking can be maintained up to a range of fundamental frequency of 800–900 Hz.

[1] Moore, G.D. (2006): The Physics and Psychophysics of Music. Course page for Physics 224, Lecture 28. Retrieved April 30, 2014, from

Project Duration

Part 1: 01/02/2014 to 10/04/2014
Part 2: 01/5/2014 to 30/09/2014

Team and Affiliations

Dieter Maurer (1), Peggy Mok (2), Daniel Friedrichs (3), Volker Dellwo (3)

(1) Zurich University of the Arts, Institute for the Performing Arts and Film, Switzerland
(2) The Chinese University of Hong Kong, Dept. of Linguistics and Modern Languages, Hong Kong
(3) University of Zurich, Phonetics Laboratory, Switzerland


We would like to thank Christian d'Heureuse for adapting the M.A.T. software for this presentation, Casper Hui for labeling and screening the sound files, and Heidy Suter for editorial assistance.