Discover
/
Article

Sine-wave speech recognition in Mandarin

FEB 15, 2012
Some languages are atonal, and that can provide some difficulties with interpretation to both native and non-native speakers. Maybe physics can help?

DOI: 10.1063/PT.5.010162

Last October my friend Karl invited my wife and me to celebrate China’s mid-autumn festival with some of his Chinese friends. The venue was a Chinese restaurant in Arlington, Virginia. After we’d eaten eight delicious courses and drunk (or tentatively sampled) sorghum vodka from Taiwan, Karl challenged his Chinese friends to recite Chao Yuen Ren ‘s poem “The Lion-Eating Poet in the Stone Den":

《施氏食獅史》

石室詩士施氏,嗜獅,誓食十獅。

氏時時適市視獅。

十時,適十獅適市。

是時,適施氏適市。

氏視是十獅,恃矢勢,使是十獅逝世。

氏拾是十獅屍,適石室。

石室濕,氏使侍拭石室。

石室拭,氏始試食是十獅。

食時,始識是十獅屍,實十石獅屍。

試釋是事。

Even if you can’t read Chinese, Chao’s poem looks as though it might be straightforward for native speakers to hear and understand. But Chao, who was a linguist, wrote the poem to demonstrate the futility of transliterating classical Chinese into the Roman alphabet. Every character in the poem is transliterated as “shi.”

Granted, Mandarin uses four tones that help distinguish otherwise identical-sounding words: high level, rising, falling and then rising, and high falling. But adding the corresponding diacritical marks does little to ensure comprehensibility.

As we discovered around the dining table, the poem is also incomprehensible—hilariously so—when recited in modern Mandarin . Of course, most sentences uttered by Mandarin speakers do not consist of the same syllable, but they do feature different tones. Listening to the recitals made me wonder: How important are tones to the comprehensibility of Mandarin, Cantonese, and other Chinese dialects?

Formants and fundamentals

Five months later, I came across an answer to my question in a paper in JASA Express Letters by Yin Shan-Kai of the Institute of Otolaryngology at Shanghai Jiao Tong University and his colleagues.

The starting point for Yin’s work is the idea, originated by Gunnar Fant in 1960, that speech can be passably reproduced by modulating the amplitudes of a small number of sine waves of certain fixed frequencies. Those frequencies do not include the overall pitch of a person’s voice, what you might call its fundamental frequency F0. Rather, the frequencies correspond to the strongest peaks that are present in the frequency spectrum when a person utters a given vowel or consonant.

Fant called those characteristic frequencies formants. Only two formants, f1 and f2, are needed to reproduce vowels. For example, the “oo” in “boot” can be represented with one sine wave with a frequency f1 of 320 Hz and a second, weaker sine wave with a frequency f2 of 800 Hz.

Mandarin’s four tones are conveyed by modulating vocal pitch, F0. Because sine-wave speech dispenses with F0, Yin and his colleagues hypothesized that Mandarin speakers would have a tough time understanding sine-wave Mandarin.

To test the hypothesis, Yin and his colleagues asked 41 native speakers of Mandarin to listen to two sets of sine-wave speech. The first set consisted of 10 unconnected monosyllables pronounced with each of the four tones. The second set consisted of 20 short sentences.

Listeners to the unconnected monosyllables could not reliably identify the correct tone. On average, they got the tone right only 33% of the time, which is little better than the 25% they’d score if they just guessed. Listeners did much better with the short sentences. Some listeners understood all the sentences completely. The worse comprehension rate was 78%; the mean was 92%.

Yin speculates that the Mandarin speakers in his study, being familiar with the syntax and semantics of their native language, exploited contextual clues in the sentences to compensate for the lack of tonal information. That speculation is consistent with the result of a linguistic experiment that I’ve conducted: Native speakers of English can converse with each other even when they replace every vowel with “uh.” The conversation may sound odd, but it’s comprehensible. Try it yourself!

For some people learning Chinese, tones constitute an awkward, additional complication. At first glance, Yin and his colleagues’ work might therefore bring some relief. Tones, it seems, aren’t essential to comprehension, at least for sine-wave Mandarin. But the relief is illusory. Reaching the point where you can dispense with tones doubtless requires mastering the whole language, tones and all.

Related content
/
Article
The scientific enterprise is under attack. Being a physicist means speaking out for it.
/
Article
Clogging can take place whenever a suspension of discrete objects flows through a confined space.
/
Article
A listing of newly published books spanning several genres of the physical sciences.
/
Article
Unusual Arctic fire activity in 2019–21 was driven by, among other factors, earlier snowmelt and varying atmospheric conditions brought about by rising temperatures.

Get PT in your inbox

Physics Today - The Week in Physics

The Week in Physics" is likely a reference to the regular updates or summaries of new physics research, such as those found in publications like Physics Today from AIP Publishing or on news aggregators like Phys.org.

Physics Today - Table of Contents
Physics Today - Whitepapers & Webinars
By signing up you agree to allow AIP to send you email newsletters. You further agree to our privacy policy and terms of service.