When we compare the sounds of the various Austronesian languages from Taiwan to Rapa Nui, there are several distinct patterns we can draw. Usually, we would find anywhere from 15-20 consonants, and perhaps 4-6 vowels which may or may not be distinguished by length. Some languages take this to the extreme, with Polynesian languages like te reo Māori and ʻŌlelo Hawaiʻi having fewer sounds, and some languages like Nemi in New Caledonia having more consonant sounds. However, amongst these patterns, the most common pattern you would observe amongst these thousand or so languages under the Austronesian family is that they do not really have a system of phonemic tone.
Yet, there are some exceptions to the norm, which have developed their own systems of phonemic tone. Some of these feature contours, while others generally have tone registers like high, middle, and low. Nevertheless, studies like this one on Jabêm in Papua New Guinea suggest that these tones are innovations rather than retentions of some ancestral language. Some patterns of tonogenesis, that is, the process of evolving phonemic tone in a language, include some sort of derivation from voicing contrasts. High tones, for example may occur in syllables ending in voiceless stops like /k/, /t/, and /p/, while low tones may occur in syllables ending in voiced stops.
Pulling up a map of the Pacific, we can roughly pinpoint approximately where the few tonal Austronesian languages are spoken. These are mainly in the regions of Southeast Asia, New Guinea, and New Caledonia. Together, these tonal Austronesian languages constitute just around a dozen languages out of the 1000 or so Austronesian languages in total, making phonemic tone an extremely rare occurrence in this language family.
Today, we will briefly talk about the general regions where the tonal Austronesian languages are spoken, and perhaps we will go into greater detail about them at some point in the future.

Southeast Asia
Southeast Asia, particularly the Malayan Peninsula, is home to several dozen Austronesian languages, particularly of the Malayo-Chamic branch. This is where you would find languages such as Malay and Orang Kanaq. However, this is not the part of Southeast Asia that we are interested in today. We are a bit south of our area of interest.
The Chamic languages form a branch of Austronesian languages separate from the Malayic languages, and are predominantly spoken in Vietnam, Cambodia, Thailand, Aceh (in Sumatra), and Hainan (in China). Although the common ancestor of these Chamic languages, like the Austronesian languages in general, lacked phonemic tones, it has been proposed that through the influence of tonal languages in the region that tonal systems emerged in some of these languages. Interactions may include those with the various Chinese varieties, the Austroasiatic languages, and the Kra-Dai languages all spoken in Southeast Asia.
According to the linguist Graham Thurgood, the common ancestor of the Chamic languages, proto-Chamic, was proposed to have four basic vowels, three possible final diphthongs, and disyllabic morphemes (i.e. CVCV(C)). It was argued that under Austroasiatic influence, the syllable structures changed to have a stressed final syllable. This influence also underscored the influx of loanwords of Austroasiatic origins into the Chamic languages.
As the Chamic languages diverged, however, some did not go on to develop full tonal systems. For example, the Roglai, Rade, and Jarai languages did not, and still do not have phonemic tone. In those that did, this was argued to occur in several stages. First came the move towards having words generally being of one syllable lengths, through the erosion of the unstressed syllable. And next came the erosion of consonant clusters, and then the development of register systems where vowels following certain initial consonants would have a different phonation type from another set of initial consonants, such as a breathy voice. This is seen in Western Cham spoken in Cambodia and parts of southwestern Vietnam. These phonation differences set the stage for tones to develop.
However, these different vowel registers also have a different outcome. Let us look at the Haroi language, spoken in Bình Định and Phú Yên provinces in Vietnam, with close historical contact with the Bahnar language in the region. This language distinguishes a lot more vowels than many Austronesian languages, at 11 simple vowels (plus their long counterparts) and at least 7 diphthongs, compared to 4 reconstructed simple vowels in proto-Chamic. It turns out that the Haroi language had undergone a form of restructuring of its vowel register system, with the product being the distinction of many more vowels. However, this restructuring pattern was different from the one Bahnar underwent at some period in its history before Haroi.
Our next example takes us to the Phan Rang Cham language, or Eastern Cham, spoken in southeast Vietnam. Following contact from languages using register systems, Phan Rang Cham also made contact with a major language in the region as well. A fully tonal language called Vietnamese. It is hypothesised that Vietnamese influence led to the development of tones in Phan Rang Cham, though a time frame by Thurgood was not suggested. Anyway, Phan Rang Cham has anywhere from 2 to 4 tones, depending on how analyses distinguished some of these tones. The two basic tones are the high and low tones, which have their own cases where the syllables end with glottal stops. It is unclear if the final glottal stop would constitute separate tones, and linguists have disagreed on this issue. Anyway, Phan Rang Cham generally has an atonal first syllable in the words that still have two syllables. It is argued that these tones emerged from assigning the higher pitch, which became the high tone, to the modal voice, while the lower pitch, which would become the low tone, was associated with the breathy voice in the registers.
Another Chamic language of particular note is the Tsat language, a Northern Chamic language mainly spoken in Hainan, China. This language boasts a total of five tones, the high, mid, and low tone registers, and a falling and rising tone contour. Linguists have proposed reconstructions of the final consonant sounds that could have given rise to these tones, such as the final /h/ and /s/ giving rise to the high tone, and vowels and nasals giving rise to the low or mid tones. A proposed hypothesis for tonogenesis is the influences from other languages in the region that Tsat might have interacted with, which tended to be tonal in nature. This included the various varieties of Chinese, Hmong-Mien languages, and the Kra-Dai languages spoken in Southeast Asia.
In southern Thailand, namely the provinces of Phang Nga, Ranong, and Phuket, we can find an Austronesian language called the Moklen language. There are claims that Moklen is a tonal language, with two tones — a high tone, and a low tone which may also be low-rising on stressed vowels. The high tone is more modal, while the low tone tends to have breathy phonation, with a majority of words carrying the high tone. Distinguishing between these tones have posed a challenge for linguists working to document the language, and Moklen tones have been proposed to develop from various phonetic contrasts like stress, which involves vowel quality and pitch as well.
From this section, it does appear that tone in the Austronesian languages developed predominantly from external influences by tonal languages spoken in the region, but this does not necessarily entail the adoption of the full tonal system exhibited by the tonal non-Austronesian languages. However, there are other factors to be considered here, such as the internal and external drivers that might bias towards the development of tones in these languages that did. To what extent the regional Austroasiatic languages, Kra-Dai languages, and the Chinese varieties played a role in Chamic tonogenesis remains to be studied.
New Guinea
The most well-known cluster of tonal Austronesian languages in New Guinea belong to the South Halmahera – West New Guinea (SHWNG) branch. Here, there are at least six tonal languages that have been identified or proposed, although like many languages of the island, research in some of these languages is lacking. Kamholz, for example, has found that the Moor, Yerisiam, and Yaur languages spoken in the southern Cenderawasih Bay have developed tonal systems of some form, and claimed that these evolutions of phonemic tone occurred in independent events. The Raja Ampat languages spoken west of New Guinea, Ma’ya, Matbat, and Ambel, also have their own tonal systems too.
These tones are generally quite varied. Moor, for instance, exhibits a word tone system with four tones (high falling, low rising, low rising to mid, and high-low-high), while Yerisiam and Yaur exhibit two register tones, high and low, with two possible contour tones (high-low, low-high). The Raja Ampat languages also display a similar variation in their tonal systems, with Ma’ya having high, rising, and falling tones, Ambel having high and toneless ones, and Matbat having high, low, high-falling, low-rising, and low-falling tones.
Other languages that might be of interest would be the Kara language, which is a Western Oceanic language spoken in New Britain, an island part of the Bismarck Archipelago just northeast of New Guinea. Linguists have reported an emerging system of tonal contrasts in the language, with at least two different tones, high and low. The low tone also has a reported variant, the mid tone level, which is observed before final glottal stops. This analysis mainly involved acoustic analyses of minimal pairs, particularly the fundamental frequency in speech also known as pitch.
The Yabem or Jabêm language is also another tonal Austronesian language that belongs to the Western Oceanic branch of languages as well. This language is spoken in the Morobe Province of Papua New Guinea in the northeast of New Guinea, and according to Bradshaw’s 1979 publication, has two tones, high and low. He also remarked its recency of development, with its pattern of tonogenesis being that the high tone arose from voiceless stops /k/, /t/, and /p/, and the low tone arose from the voiced stops /g/, /d/, and /b/. A similar high-low contrasting tone system was documented in Bukawa, one of Yabem’s closest cousins, but the pattern of tones is quite different from Yabem, as it follows a less predictable pattern.
Contact with tonal non-Austronesian languages has been proposed as a potential driver behind tonogenesis in these languages, as tonal languages are fairly prevalent in the languages of New Guinea. However, evidence of such contact between the Austronesian languages of south Cenderawasih Bay and tonal non-Austronesian languages has not really been found. Other hypotheses include some form of language shift or even some form of historical contact event, but we would probably not know for certain to which extents these factors play a role in tonogenesis in these languages. The lack of language data, and difficulty in investigating the histories of these languages compound this problem of assessing the contact relationships between the tonal Austronesian languages of New Guinea and the tonal non-Austronesian counterparts.
New Caledonia
New Caledonia is a group of islands that belong to the region of Melanesia in the southwestern parts of the Pacific Ocean. Forming a part of Overseas France, there are around 28 indigenous languages in New Caledonia alongside French, and the French-based creole called Tayo. Amongst these 28 indigenous languages, pretty much all of them form a branch in the Southern Oceanic languages called the New Caledonian languages, and West Uvean, also called Fagauvea, is part of the Polynesian languages.
When we compare the tonal systems of the tonal New Caledonian languages, there is pretty much uniform pattern we can observe. There are three contrastive tones in these languages, which are high, mid, and low. All of them are located on the main island of New Caledonia, where the capital, Nouméa, is located. These languages are Cem (Cèmuhî, Wagap), Pac (Paicî), Dub (Ndrumbea, Naa Dubea), Nme (Numèè, Naa Numee), and Ken (Kwényi, Kwenyii).
The late linguist André-Georges Haudricourt proposed that the rise of phonemic tones in the New Caledonian languages could trace back to reduplication. This was observed from identifying cognates between reduplicated words in Melanesian and Polynesian languages and the high tone in some of the tonal languages here. Reduplication is a word-forming pattern where certain words, or parts of words are repeated. Words like barang-barang (Indonesian/Malay, ‘things’, ‘stuff’) would be examples of full reduplication, while words like savavali as in latou te savavali (Samoan, ‘they walk’) would be examples of partial reduplication.
This process would have involved the loss of an unstressed vowel in a reduplication element, thereby producing what is called a geminate consonant, something like a lengthened consonant sound. Two paths would emerge from this gemination — as an aspirated consonant, or as a syllable with a high tone. The development of the high tone would have occurred in two separate occasions, once for the New Caledonian languages of the Center North, and the New Caledonian languages of the Far South of the Mainland.
This leaves us with another question — how did the other tone emerge? Jean-Claude Rivierre suggested the process of ‘downdrifting’ or innovated a low tone through some other mechanism. Among these, is an argument for some form of tone reversal for the languages of the Far South like Numèè. This was inferred by comparing words between the tonal languages in the Center North and those in the Far South, where it was found that high tones in the Center North corresponded to the low tones in the Far South.
Downdrift here would be a process that affects unmarked tone sequences, in the languages that once used only the high and unmarked tone registers. For the case of Paicî, for example, these would affect words or morphemes containing at least four mora (the shortest element that could be considered a syllable), or morphemes that are followed by some form of clitic. In Cèmuhî, on the other hand, this process is a bit more unclear. The general patterns for which the low tone corresponds to include words with an intervocalic [a] sound, aspirated consonant, or a fricative (like the ‘f’ in ‘afar’, for example), where the syllable before this consonant would carry a low tone. Such low tones may also be found in loanwords, such as those of Polynesian and French origins.
As we have seen, the development of tones in the tonal Austronesian languages generally follow some sort of phonological process that is already present in their respective languages. However, for the cases of the New Guinean and Southeast Asian tonal Austronesian languages, some linguistic influence from their tonal non-Austronesian counterparts in the region could also help shape tonogenesis, though the extent to which these languages play a role remains to be studied. In the languages of New Caledonia, however, such tonogenesis has been shown to develop independently, without proposed evidence of influence from other tonal languages of other language families in the region. This introduction also underscores how lacking we are in data pertaining to the languages of New Guinea, where more linguistic data could help in understanding how the tonal Austronesian languages there got their tones, and perhaps build a better historical picture of these languages as well.
Further Reading
Arnold, L. (2018) ‘Lexical tone in Metnyo Ambel’, Oceanic Linguistics, 57(1), pp. 199-220.
Bradshaw, J. (1979) ‘Obstruent harmony and tonogenesis in Jabêm’, Lingua, 49(2-3), pp. 189-205.
Hajek, J. & Stevens, M. (2004) ‘Tonal activity in Kara, an Austronesian language spoken in New Britain’, Proceedings of the 10th Australian International Conference on Speech Science & Technology, pp. 295-300.
Kamholz, D. (2017) ‘Tone and language contact in southern Cenderawasih Bay’, NUSA: Linguistic studies of languages in and around Indonesia, 62, pp. 7-39.
Klamer, M., Reesink, G. & van Staden, M. (2008) ‘East Nusantara as a Linguistic Area’, From linguistic areas to areal linguistics, pp. 95-149.
Maspong, S., Burroni, F., Sukanchanon, T., Pornpottanamas, W. & Pittayaporn, P. (2024) ‘Leveraging deep learning to shed light on tones of an endangered language: A case study of Moklen’, Proceedings of the 3rd Workshop on NLP Applications to Field Linguistics, pp. 37-42.
Rivierre, J. C. (1993) ‘Tonogenesis in New Caledonia’, Oceanic Linguistics Special Publications, 24, pp. 155-173.
Thurgood, G. (1996) ‘Language contact and the directionality of internal drift: The development of tones and registers in Chamic’, Language, 72(1), pp. 1-31.
Thurgood, G. (1999) ‘From Ancient Cham to Modern Dialects: Two Thousand Years of Language Contact and Change: With an Appendix of Chamic Reconstructions and Loanwords’, University of Hawai’i Press.