Tracking the colonisation of Madagascar using Malagasy

Madagascar is a rather interesting place. With such a lush biodiversity, and famous for its lemurs, this island several hundred kilometres east of continental Africa hosts a substantial number of endemic species. However, unlike other biodiversity hotspots like New Guinea, Madagascar is not particularly known for its linguistic diversity. In fact, it seems anomalous that for what is among the largest islands in the world, only behind Greenland, New Guinea, and Borneo, the only indigenous language that we know is spoken on Madagascar is Malagasy.

There are several hypotheses put forth to explain this anomaly, and we have done a little introduction to the island biogeography of languages as well. However, it would be compelling to suggest the hypothesis of time to explain this apparent lack of diversity. After all, the earliest archaeological records left behind by the first Austronesians who colonised Madagascar only date back to around the 5th century. It could also be possible that the first Austronesian cultures first settled there even further back in time, though evidence for that is quite scant. This relative recency in human colonisation could mean that there was not enough time for the Austronesian language to diverge, leading to a dialect continuum of sorts, where some dialects are mutually intelligible with one another, while more distant ones are not. This does not explain, however, the lack of indigenous Bantu languages spoken in Madagascar despite evidence of interactions between the Malagasy and Bantu peoples. We also see Bantu influences in Malagasy words, which further compounds this mystery.

Nevertheless, we are still interested in the early history of Madagascar, and how the island came to be colonised by the Malagasy, and later also settled by the Bantu who influenced the Malagasy language. This leaves us with several questions to answer, when really was Madagascar first settled by the Austronesian peoples that would become the Malagasy, and where could this first landing point be?

As mentioned, Malagasy is better described as a dialect continuum, that is, a collection of varieties or variants of a language that is spoken across a particular region, such that varieties spoken in close geographical proximity with one another are more mutually intelligible, but these differences between varieties would accumulate over distance, such that varieties with a larger geographical separation or distance may not be as mutually intelligible with one another. There are dozens of Malagasy dialects, and even Ethnologue recognises 12 distinct languages, and classifying Malagasy as a ‘macrolanguage’. The 12 Malagasy languages are each given their own ISO 639 codes as well.

These dialects can generally be split between the Eastern and Western dialect groups, generally geographically distributed around the central highlands that run down the island almost longitudinally. Eastern Malagasy dialects are spoken in the eastern parts of the island, as well as the central plateau, while Western Malagasy dialects are spoken in the western parts of the island. And if we compare the dialect map of Malagasy with the topological map of Madagascar, we could see a case where the linguistic parallel of a ring species would arise in Malagasy.

There have been several studies aiming to narrow down the likely first landing point of Austronesians on Madagascar through the linguistic lens, by comparing phylogenetic distances between the dialects. As aforementioned, differences between dialects would accumulate across distance, and so we would expect that the dialects with the most differences from the phylogenetic outgroup would be the dialects that are more distant from the likely first landing point, or at least, the region where divergence first occurred.

This takes us to the publications done by Maurizio Serva, who, in 2012 and 2020, has published studies focusing on the peopling of Madagascar by the Austronesian peoples by using a phylogenetic analysis of linguistic data. Both studies generally used similar methods, but the more recent one considered data from 60 Malagasy dialects in comparison to the 23 from the 2012 study. Availability of lexical data across more dialects might have been the motivating factor for continued interest in this topic, as well as more precise locations from which the data were collected.

Maps of locations where Malagasy dialect data in the study conducted by Serva and Pasquini (2020, left), and Serva (2012, right). Individual colours indicate clustering of dialects by genealogical relationships.

When it comes to establishing phylogenetic relationships between dialects or languages, you would usually come across the term ‘Swadesh list’. This is a list of more or less universal concepts or words that have available translations in pretty much every language there is, which aims to be culturally independent as possible. Such lists include words like “to eat”, the numbers “one” and “two”, body parts like “skin” and “teeth”, and animals like “fish”. Two main systems are used, with varying availability of data. One Swadesh list finalised in 1971 contains 100 items, while the more recent one contains 207 items. Shorter lists exist as well, one of which contains just 35 items, derived as a subset of the 207-item list.

This list was used for both studies, though the earlier one used a 200-item version for 23 Malagasy dialects, while the more recent one used the full 207-item Swadesh list for 60 Malagasy dialects. These data points were then used to evaluate linguistic distance. This is done pairwise, meaning these words were compared between two dialects. Distance (D) between words that carry the same meaning but in different dialects or languages was defined as ‘0’ if they are cognates, and ‘1’ otherwise. When done for all 207 items in the Swadesh list, the lexical distance between these two languages or dialects could be calculated by taking the arithmetic mean distance between words over all items in the list. This yields a value between 0 and 1.

Now that we get how distance is calculated, we want to get a rough estimate of the likely time of divergence between two dialects or languages. This takes into consideration the overlap (C) between the two languages or dialects used in comparison. This is done by taking the complement to the distance between languages or dialects, as C = 1 – D.

But there is a more streamlined way of determining this D. This makes use of the Levenshtein distance, which measures distance by the number of differences between two sequences, in this case, two words of the same meaning in different Malagasy dialects. Briefly put, this distance is the number of steps required to transform one word into another. A single step can only feature deletion, addition, or substitution. For example, if I want to calculate the distance between the Maori word for ‘water’, wai, and the Malay word for ‘water’, air, I will need to delete the ‘w’ in wai, then add the ‘r’ in air. This takes two steps, and so the Levenshtein distance between the two words is 2. This Levenshtein distance is then normalised by dividing by the length of the longer word, if there is a difference in word length. This normalised Levenshtein distance may still be referred to as D, and it yields a value between 0 and 1.

With these data obtained, the authors could build a tree that visualises the relationship between Malagasy dialects. In the 2020 study, this process used algorithms, one being the Unweighted Pair Group Method Average or UPGMA, and the other being Neighbour Joining or NJ. Both have some assumptions, with UPGMA assuming that evolutionary rates are constant on all branches of the tree of Malagasy dialects, and both algorithms assuming that language transmission is predominantly vertical, that is, from one generation to the next.

The cladogram generated by UPGMA showing the genealogical relationships of 60 Malagasy dialects (Serva and Pasquini, 2020).

But how was genealogical time measured? After all, one of the research questions Serva and, in the 2020 study, Pasquini, wanted to answer was if the Austronesian colonisation of Madagascar was around the 7th century. To do so requires a bit more mathematics, and some calibration. This is because the genealogical distance between two languages measured by the time from the most recent common ancestor between the two is T = -(Ļ„/2) lnC, where Ļ„ is a constant to be determined. This required calibration with the histories of other languages and historical events, preferably those where we know about the last common ancestor of these languages, and those that are temporally localised. Thus, Serva and Pasquini decided to use 46 Romance languages spoken in the Italian and Iberian peninsulas and Southern Gaul, with relevant historical events occurring in that region as well.

When compared with genetic data, the linguistic data seem to corroborate with genetic data to estimate the time of first Austronesian landing on Madagascar to be some time in the 7th century. More precisely, using the figures derived from the calibration, the UPGMA tree returned a genealogical distance from the root to the leaves of between 1338 and 1396. This meant that the earliest point of divergence of the Malagasy dialects could be anytime between 623 and 681 CE.

The cladograms generated by Serva and Pasquini do not inform us about the geographic location of first landing though. Additionally, these cladograms leave out the reality of horizontal transmission of Malagasy dialects, meaning that certain factors including geographical ones should not be omitted when conducting such a study. In this study, this difficulty is further compounded by the lack of coordinate data collected by the researchers. Instead, the names of towns or villages from which the data were collected were used. Thus, when conducting their reconstruction of Malagasy geographies using dialect clustering by genealogical distances, which ideally involves putting points on a certain area of the globe but realistically it was basically putting points on a plane, some data points seem to be located somewhere in the ocean.

These data could then be used to roughly estimate the location of first landing by the Austronesians on Madagascar. This involves the multi-level approach in evaluating linguistic diversity, which we have covered previously. Regions with the largest diversity would be the most likely candidates for first landing locations by the Austronesians on Madagascar, as inferred in previous ecological and linguistic studies, although the studies they cited were around a century ago. However, their definition of diversity had to be altered such that it mitigated the problems incurred in the study design — that diversity was defined only in the settlements from which dialect data were collected, and diversity was not a local quantity, and it needed to be tuned.

The authors found that the southeastern coast of Madagascar could have been the most likely first landing site, as it was the most diverse region. Conversely, they found that the northern regions of the island were the least diverse, meaning that they would have been the last regions to be settled by Austronesians. This suggested the following colonisation narrative, that the Austronesians from Indonesia, likely from Borneo, sailed into the Indian Ocean with the help of ocean currents, to reach the southeastern coast of Madagascar, and then migrated and settled northward over time. This suggested the following map for the Austronesian colonisation of Madagascar:

However, this paper also postulates the possibility of an indigenous people group that could have inhabited Madagascar prior to Austronesian arrival, as inferred from Malagasy mythology, and speculation of linguistic influences that cannot be explained by other interactions like with other Malagasy dialects, Bantu languages, and later, French. It was a theory meant to explain the relatively unusual characteristics of the Mikea dialect. This was refuted by lack of strong evidence, such as linguistic and genetic evidence, supporting the influence of African mainland people groups (such as the Hadza) on a subset of Malagasy peoples, namely the Mikea or the Vazimba. Thus the authors concluded that Mikea’s deviations from Malagasy dialects were largely due to innovations or changes acquired in isolation.

Additionally, there are competing theories explaining the relative diversity of Malagasy dialects in its southeast. This includes theories such as multiple colonisation events. One such theory pertains to the continued interactions with Indonesian peoples in southeastern Madagascar, establishing trading relations with the Malagasy. This would have supported a high diversity of Malagasy dialects in the southeast in contrast to the north. The proponent of this theory also posited that Malagasy survived as the sole language on Madagascar due to a population bottleneck when first arrival was established. Malagasy would have formed somewhere outside Madagascar before first landing, though this remains to be studied.

So, according to the findings provided by Serva and Pasquini’s study, the primary takeaway is, the Austronesians who would become the Malagasy arrived from Indonesia onto Madagascar at some time in the 7th century, likely in the southeastern coasts. The Malagasy then migrated northward, diversifying in their dialects along the way. This likely produced the map and cladograms of Malagasy dialects we see today. These findings corroborate with earlier findings regarding the Austronesian colonisation of Madagascar, but still leaves several avenues of research to be conducted. This involves the linguistic relationships between the Malagasy dialects and the Indonesian and African (likely Bantu) languages. Nevertheless, these studies have opened my eyes about Malagasy and its various dialects. Previously, I have thought Malagasy as a single language, perhaps standardised based on the Antananarivo dialect, rather than as a dialect continuum that could be split into further constituent languages. It was the application of the latter paradigm that conceptualised and contributed to the findings of these studies, something that warrants further research to build a comprehensive picture of Madagascar’s early history, and its early relationships with Indonesia and continental Africa.

Further Reading

Adelaar, A. (2013). Malagasy Dialect Divisions: Genetic versus Emblematic Criteria. Oceanic Linguistics52(2), pp457ā€“480. http://www.jstor.org/stable/43286359.

Serva, M. (2012) The Settlement of Madagascar: What Dialects and Languages Can Tell Us. PLoS ONE, 7(2), pp e30666. https://doi.org/10.1371/journal.pone.0030666.

Serva, M., Pasquini, M. (2020) Dialects of Madagascar. PLoS ONE, 15(10), pp e0240170. https://doi.org/10.1371/journal.pone.0240170.

Leave a comment