The language of the Orchid Island — Tao (Cizicizing No Tao, Ciriciring No Tao, Ireriak No Tao)

Some 46 kilometres southeast of Taiwan, lies a small volcanic island governed as Lanyu Township of Taitung County, Taiwan / Republic of China. Separated from the Batanes islands of northern parts of the Philippines by the Bashi Channel of the Luzon Strait, this island is inhabited speakers of a language more similar to languages spoken in the Philippines, rather than the Formosan languages spoken in Taiwan. This is Tao, also referred to as Botel, Tobago, Lanyu or Tawu. While this language and people are also referred to as “Yami”, originating from a Japanese ethnologist to mean “north”, this name has been rejected by the Tao people in recent years. Thus, “Yami” has become a pejorative, and “Tao” is preferred when referring to the language and the people or culture as a whole.

As with many indigenous languages spoken in Taiwan, data on the number of native speakers of Tao is quite scarce, with 3800 estimated in 2006, and 2700 in 2008. Given these estimates in the low thousands, it seems reasonable to say that the Tao language is currently endangered, but we are not quite aware if any revitalisation efforts are in place for Tao, other than the indigenous language resource hub we found, and linked to at the end of this post.

Among the indigenous languages spoken in Taiwan, the Tao language stands out among its peers not just because of the geographical region in which it is spoken, which is Orchid Island, a little ways off the main island, but also because of its classification as a Malayo-Polynesian language instead of a Formosan language, among the Austronesian language family. More precisely speaking, Tao is part of the Batanic language branch of the Malayo-Polynesian languages, making it more related to the languages spoken in Northern Philippines compared to Taiwan. Some linguists, however, argue that Tao is part of a separate branch in the Malayo-Polynesian languages. Even so, Tao is widely regarded as part of a larger dialect continuum called Ivatan, spoken primarily in the Batanes Islands of the Philippines.

The phonology of Tao features a total of 20 consonants and 4 phonemic vowels. Among the consonants, there are a couple that stand out, namely, the retroflex consonants /ɻ/ and /ʂ/. The former is quite a special one, given its unique occurrence among the indigenous languages spoken in Taiwan, and is also found in languages such as Tamil and Malayalam in South India, and Pitjantjatjara, spoken in Australia. However, the voiced retroflex approximant also occurs in some varieties of English, Standard Mandarin and Portuguese.

Among the phonemic vowels, /a, ə, i, o/, there are some cases in which the /o/ would sound more like a /u/, particularly when it follows a labial sound like /p/ or /m/. This makes the words “poyat”, “momodan”, and “mavota” sound like “puyat”, “mumudan”, and “mavuta”. In addition to these vowels, Tao also recognises four diphthongs, “ay”, “aw”, “oy”, and “iw”, although in some variants, “ay” and “aw” would sound closer to “ey” and “ew”.

The grammar of Tao does not appear to have much special points to talk about, and it does appear to share most of the common features with Austronesian language grammars. This includes the use of inclusive and exclusive first person plural pronouns, namely, “yaten” (inclusive we) and “yamen” (exclusive we). Additionally, there is a whole host of affixes, prefixes and suffixes added to nouns and verbs to add more details to the predicate, clause or sentence.

Similar to several other Formosan and Malayo-Polynesian languages, adverbs do not really feature much in Tao, and requires the use of auxiliary markers, adjectives, or other clauses to construct the adverb. Additionally, with the unit of Tao being the predicate, there are many similarities to be found, and parallels to be drawn between Tao and Ilocano, Visayan or Tagalog.

From the rather close relationship between Tao and languages spoken in the Philippines, we would expect to find many notable cognates between Tao and the Philippine languages like Tagalog, Ilokano and Visayan, and this is pretty much the case. From numbers to some pronouns, we find several words that are seemingly identical between Tao and the Philippine languages, such as “father” (“ama”), “offspring / child” (“anak”), and “mother” (“ina”). The cognates for number words found between Tao, and the Philippine languages (in particular, Visayan, Tagalog and Ilokano) are shown below, from the Wikipedia entry for the Tao language.

There have been various loanwords entering spoken and written use in the Tao language, most of which originating from Japanese and Mandarin Chinese. The one known Chinese loanword is the one for “wine”, which is “potaw cio”, quite literally the adaptation of the Chinese word “pútáojiǔ (葡萄酒)” to the sound and writing systems of Tao. Loanwords originating from Japanese also follow similar adoption patterns, such as “airplane” (“sikoki”, from “hikouki”, 飛行機), “school” (“gako”, from “gakkō”, 学校) and “ticket” (“kipo”, from “kippu”, 切符).

Just like the Formosan languages spoken in Taiwan, if you want to learn the Tao language, I have to tell you that many resources are in traditional Chinese. However, there is one central site that documents the words, dialogues, audio and pronunciation of Tao, along with most of the Formosan languages. With multimedia materials, translations of books and stories into respective indigenous languages, this serves as a platform for people to learn about the cultures and languages of Taiwan, as well as preserving the Formosan languages in the digital world. Access it here at: This is so detailed, that even known individual dialects are compiled, and users are prompted to choose a dialect to learn that falls under the target language.


This is probably the only Malayo-Polynesian language I have heard of, that is spoken in Taiwan, albeit not quite on the main island, but it is quite interesting to find languages like these. In the next posts in the series, we will go back to looking at Formosan languages, such as Bunun, Puyuma, and Saisiyat.

Featured image: Lanyu Airport, from Google Street View

4 thoughts on “The language of the Orchid Island — Tao (Cizicizing No Tao, Ciriciring No Tao, Ireriak No Tao)

  1. “Yami” is pejorative. Use “Tao”. The language has few speakers, but the language permeates all aspects of life and is used vigorously at all ages, and authors write and publish natively in the language. I don’t think that qualifies as endangered. PIE at one point in history may have had less than 10,000 speakers, but it grew into a juggernaut (to borrow the Oriya word ଜଗନ୍ନାଥ) of a language family, crushing as many languages as possible in its path of destruction. Tao will stay confined to its island for centuries to come with a few thousand speakers at most.


  2. Hi, thank you so much for reaching out to me about this. I have made the necessary changes when referring to the language, and added an extra bit for viewer discretion, given that many sources, even Wikipedia, still refer to the language using the pejorative term.



    • What I’ve learned over the years is that the “internet” including its encyclopedias, are still just “hearsay” and to get the real truth, one really has to return to real publications such as books. Or actually go on location and talk to the experts. I’ve had the privilege to have the Tao dictionary compilers in attendance of one of my speeches last year at Academia Sinica. And if you need to reach out to any of the indigenous language community or regional Chinese languages in Taiwan, feel free to contact me. I’m also conversant in a few of them, but sorry that Tao is not one of them.
      You can find all kinds of information published by the government at their website:
      If it’s okay with you, I’d like to discuss the problem I have with the logic of this sentence: “Given these estimates in the low thousands, it seems reasonable to say that the Tao language is currently endangered.” So this begs the question, at what time in the long history of this language’s existence the population on the island was any greater than this, or has the island ever supported a population greater than the current size?
      The indigenous population now stands at 4200 people out of over 5000 including non indigenous, and yet these numbers are higher than the 3300 cited for the year 2006. Now the real question is whether the island ever supported a population greater than 5000 in its whole history, and if this is true, then one would be tempted to say that the language certainly was not endangered in the year 1800 despite “the number of speakers being in the low thousands”
      Is it not reasonable to consider that basing endangered status simply on population counts doesn’t make sense. In my previous message I mentioned that it is plausible that Proto-Indo-European may have had at one point less than 10,000 speakers, even though this impossible to know or prove, yet we know now that PIE having grown into the largest language family on earth was never endangered.
      Would it not make more sense to consider calculating a language’s vitality by simply observing what 15 year old children choose to speak with each other. Due to peer pressure, pride, identity, this criterion is an easy estimate for the vitality of a language, as they will grow up continuing to speak the language of their choice with their spouses and children. Under these observations, we can now ascertain that Tao certainly has problems. 朗島 village is currently the only place where full fluency is achieved by children, whereas other villages have switched to Mandarin. So I would say that Tao’s endangered status is not due to its population size, which has been stable for centuries, but rather due to the language that teenagers choose to speak.
      Under the same criteria, we can take another language such as Southern Min or Hakka, both of which have tens of millions of speakers, and can call them threatened, not because of their population size, but because of attitudes that native speakers take towards speaking those languages. And now you have two opposing examples where population size cannot determine the vitality status of a language.


      • Sorry for the late reply, I was busy with other paperwork, and since I am the only one running this site, replying had to be put on hold for a bit.

        Firstly, I want to clarify the sources from which I drew up such a phrasing. The main one I looked at was the UNESCO classification from 2008, when 2700 speakers of Tao were recorded or estimated. While it was classified as “endangered” at the time, I realised after the fact (that is, after reading your comments) that this was lifted to “vulnerable” in 2015, this time from the more up-to-date UNESCO Atlas of the World’s Languages in Danger. I do not have access to Ethnologue so UNESCO thus formed my main source. However, I do admit that by using a more outdated source, some content in this post will be flagged as potentially misleading.

        Looking at how UNESCO classifies language endangerment, I understand that they consider nine factors comprising both somewhat objective and subjective ones. The more objective criteria include the absolute number of speakers, and the proportion of speakers within total population. On the other hand, the more subjective criteria would include the responses to new domains and media, and attitudes towards their own language. Interestingly, on the UNESCO Atlas site, I noticed that the legend denoting endangered statuses mainly showed the degree of intergenerational language transmission, while not quite showing the other eight factors.

        Going through the document assessing language vitality and endangerment (, I see that each factor is assigned their own grade, and weighted together to form an overall classification. This forms the potential for certain factors to outweigh others, tilting the balance towards a certain classification. When writing this, I did not have access to the connections you have in assessing Tao’s vitality and other factors mentioned in the document, and so I gravitated towards the small speaker population size as a factor that could have influenced the overall classification (as endangered, as of 2008).

        My justification for this logic is more or less “ecological”, that small populations are more likely to go extinct than larger ones, by processes like stochastic ones, and this ecological concept could be extended into the field of language diversity. While this logic can be argued as potentially fallacious, this was made using the information I had on hand at the time, and the small speaker population was the most probable to me.

        I am open to discussion on this topic, and definitely, I would appreciate some recommendations for resources I could refer to when writing up future introductory posts.

        One question I would like to raise is, you have mentioned Proto-Indo-European speakers as an example for your argument. This example was quite interesting, and I was wondering if there is an academic source I could do further reading in.



