The world of constructed languages is near limitless. From the days of Lingua Ignota, to more modern ones like Esperanto, constructed languages have pushed creativity to further heights in the linguistic fields. Such inventions have appeared in many pop culture scenes, aiding in world building, lore, and creating a more authentic-sounding fictional culture. Examples include Klingon in Star Trek, Quenya, among other Elvish languages in the Tolkien legendarium, High Valyrian and Dothraki in the Game of Thrones series, and Na’vi in James Cameron’s Avatar. These languages are, of course, created by humans. This then sparked my curiosity, could an artificial intelligence come up with its own constructed language?
OpenAI has been among the most prominent artificial intelligence, or AI, organisations, and their developers have created this thing called ChatGPT. While not the first chatbot, it surprises many with its articulation of responses across many topics, although it may not give complete factual accuracy at times. This by far, reminded me of the days of Cleverbot, a chatbot that learns from human input alone, and it is quite an old chatbot at that. Nonetheless, ChatGPT has been proven rather versatile, from coming up with essays, compositions, writing or debugging computer programs (to varying utility), and songwriting. This attracted awe, and a sizeable amount of controversy. Plagiarism and other educational concerns were among the more major issues with ChatGPT, with some universities outright banning its use.
While the aim of today’s ChatGPT challenge was to make it invent its own language, it is perhaps not the first time humans have given it such prompts. There are examples of people trying to teach ChatGPT their own constructed language, giving it more linguistic capabilities. In fact, in 2017, there was a case that Facebook shut down its own AI bots after they started conversing in an unusual language that only the bots understood. These bots did not invent their own sounds or words, but used English words in a rather bizarre manner. While this example did not show that AI could invent its own sounds or words, it certainly showed the capability for AI to come up with its own syntax that can be understood by other “speakers” of that language. With this, let us see if we could push ChatGPT to make its own fresh new language!
Before starting, I would like to preface this by saying that I am using ChatGPT for an exploratory purpose. ChatGPT, while it can give elaborate and articulated responses, is not designed to give advice.
Being new to ChatGPT, of course I had to start simple. Firstly, I tried giving it a generic prompt.
And, it is not a good start. Perhaps it interpreted “help” as something that it could provide me advice for, and hence this step-by-step response. I had to rephrase my prompt.
Now we have some progress. The consonant phonemes are not too particularly special, although it is interesting that it does not include voiced stops, sibilants, and fricatives like “b”, “d”, “g”, “z”, and “v”. Scrolling down, we do get a combination of grammatical rules that seem to deviate from natural languages we are more familiar with.
But now we get to the example words and vocabulary. This is where we spot our first inconsistencies. The example vocabulary ChatGPT generated looked pretty similar to an already existent language — Spanish. Unless Spanish is actually an AI-generated language instead of a natural language, there is something wrong here. Furthermore, these words ChatGPT has proposed have sounds that do not correspond to the phoneme inventory it proposed, such as the voiced palatal nasal consonant [ɲ], shown as ñ in Spanish orthography. At this point, I had a couple of primary options — to double down on this and see how ChatGPT corrects this, or to start afresh with a new prompt but with the same purpose.
For now, let us look at the first approach. I tried pointing out the inconsistencies in the phonology of its “constructed language”…
While it did add a bunch of voiced sounds to the mix, it still lacked some sounds that are present in the words it proposed, like the voiced palatal nasal consonant mentioned earlier. I kind of gave up on pursuing that, and called it out for “inventing” Spanish given the words and example sentence it generated (some grammar parts did correspond with Spanish, but the idea of using postpositions stuck out).
After calling it out, it returned a result that still showed some glaring inconsistencies. The phonological inventory it gave still preserved the original version, which lacked voiced stop consonants, for instance.
But we get a breath of seemingly fresh air this time. Check out the sample grammar rules it returned.
Special case particles instead of prepositions or postpositions? That sounds like an extensive case system, right? It seems that this time, ChatGPT likes using special particles to identify various grammatical elements in a clause, and with an SOV word order, hopes seemed high for an interesting-sounding constructed language.
But after scrolling down and reading further…
We still see glaring inconsistencies between the words proposed and the phonological inventory of the constructed language. Furthermore, while the words proposed still heavily resemble Spanish, ChatGPT has actually made an effort to modify it (although by removing the last letter of the word) to create something “new”. What makes this iteration special are the particles proposed, and how they are used in the example sentence. However, even though this is part of the same response, it appears that ChatGPT has forgotten about the SOV word order, and remains rather murky on how the case particles are used. Although the correct sentence in this case is “y-n cas-a grand-a tiene-y”, I tried correcting it by first specifying the correct word order, which it acknowledged it was wrong in.
Next was the really strong Spanish influence in vocabulary. I suspected that the language data ChatGPT was fed with had a sizeable chunk of Spanish in it, although I could not really ascertain how much of the Internet is in Spanish. Here is what it generated next.
This time, I got slightly more interesting results. The words zor and zorwel, to my knowledge, are not greeting words in most, if not, all natural languages, making this stand out as rather unique. While it preserved the various grammatical suffixes it suggested earlier, ChatGPT seems to be pulling sample words of English origin instead, and modifying some of them to be phonologically similar, but orthographically distinct. This time, I think ChatGPT has incorporated a bit of English slang, as with the words smol and dawg. For now, I think I have an idea regarding ChatGPT’s word-forming abilities, and how I could try to work towards making it create its own words that might be beyond whichever words, phrases, sentences, and perhaps even entire publications or paragraphs it has been trained on. Zor and zorwel seemed to be a great start nonetheless. It was then time to give this language a name.
It seems that it is unable to synthesise its own name for its constructed language (I am not sure if it is aware that it just “invented” its own constructed language), instead pushing this task to the creator (me for prompting, or ChatGPT for creating). I had to come up with something else. Inspired by Toki Pona, or the Language of Good, I wanted to see what ChatGPT would come up with for a similar name. The idea I had in mind was “Goodspeech”, just a combination of the two words “good” and “speech”.
It appears that ChatGPT prefers that the word “pret” has meanings encompassing both “good” and “pretty”. It probably points to its preference to refer to appealing or positive attributes about a noun as “pret”, although I did not actually confirm this, or try the opposite with “bad” as the antonym was not suggested by ChatGPT.
The word “fal”, meaning “to speak”, is likely derived from the Portuguese language word “falar”. The noun form also strongly resembles the Portuguese word for “speech”, “fala”. Now that I had my two elements for the name I wanted to give, it was time to suggest the word “Falpret”. This is its response.
Lastly, I wanted to see what ChatGPT thinks about a writing system for Falpret. Knowing that ChatGPT is a language model, I did not have any expectations on what it can do, but I was curious anyway. Unsurprisingly, it gave this response:
For now, that is where we would leave things. While ChatGPT seems to have the basic ability to come up with its own grammatical rules, in the form of fundamental grammatical suffixes, it does struggle when it comes to inventing words, as it tends to loosely modify or just use words from whichever substrate language it can derive from. Inconsistencies between words and phonological inventories are a persistent problem, though I think that could be fixed. For a first run, I think ChatGPT has at least some capacity to create its own constructed language, although as a language model, it really struggles to really get creative with the language data it has been provided during its training.
In my next attempt, I will try to feed it some sample words in various languages, and see what it can come up with. If you have any suggestions on how I can tweak my prompts to get the most out of my experience with ChatGPT, please let me know in the comments.
One thought on “I asked ChatGPT to invent its own language (Pt 1)”
Why does Chat GPT talk like they’re in customer service or in third person like that?🤨 That’s so strange.