There is pretty much no denying that all modern natural languages today are predominantly spoken. It is also perhaps the one aspect of language that we pretty much take for granted, and not really paying attention to the various processes underlying how we talk. I have been wanting to cover the fundamentals of phonetics for a while, as I have thrown about various phonetics terms when discussing consonants such as “alveolar”, “ingressive”, and “retroflex”, and realised that I have not given these terms the explanations they deserve. As not all of us are familiar with these terms, I have decided to dedicated a Made Simple series to discussing the mechanisms of speech production in humans.
Generally speaking, speech is formed from the movement of air in or out of the human airway. This process involves several parts, from anywhere in the oral and nasal cavity, to the larynx (also known as the voice box) and even the diaphragm. Airstream mechanisms form one of the trifecta of human speech production, with the other two being phonation and articulation. We will look into these two mechanisms in the coming weeks, but today, let us talk about the various forms of airstream mechanisms that we humans generally use.
In short, airstream mechanisms form the process by which airflow is produced in the vocal tract. There are three main organs involved in this process, although several more may be used in speech therapy, especially after a laryngectomy, or surgery that involves the removal of the voice box. These organs are referred to as initiators, as they can be argued to form the first parts of speech production.
This trifecta of organs involved in airstream mechanisms is the diaphragm, the glottis, and the tongue. These three organs may create pressure differences between the airway and the environment to create airflow, which can occur outwards (egressive airflow) or inwards (ingressive airflow). This gives us a total of 6 possible combinations of airstream mechanisms, but only 4 or 5 of these are found in words in human languages.

Pulmonic egressive airstream
This is perhaps the most ubiquitous form of airstream mechanism across all human languages. Every human language has at least a few of these, ranging from the familiar /k/, /t/, and /p/, to the relatively more obscure /ʟ̝/ (the voiced velar lateral fricative, found in the Caucasian language Archi). This occurs as an outward movement of air created by the diaphragm, lungs, and ribs, as air is pushed out of the lungs by movement of the diaphragm and ribs. The reason why this airstream mechanism is described as pulmonic is due to the common association with the lungs, where the airflow is ultimately created, although the diaphragm and ribs play a substantial role in producing this airflow.
As mentioned, many sounds used in human speech use this main category of airstream mechanism. In fact, in many European languages, this is the only type of airstream mechanism used, and forms the majority of sounds in the phonological inventories in many languages.
Pulmonic ingressive airstream
Imagine the typical sounds you would make with pulmonic egressive airstreams. But now, do it in reverse. Instead of the airflow going out, such as when you exhale, airflow here is going in, as you inhale. Gasps and other forms of interjections are mainly where you would find it being used. Backchanneling processes in Swedish and Ewe also use an ingressive airstream. And more rarely, Finnish, for example, could have entire phrases sounded using an ingressive airstream.
The Damin language, a ritual language of the Lardil people in Australia, has been documented to use this airstream in one of its consonants, transcribed as l* or [ɬ↓ʔ]. Ladefoged & Maddieson (1996:268) have also identified an entire series of click consonants in the Taa language that feature a pulmonic ingressive nasal airflow. These consonants could be argued to constitute among the more complex ones in human language, and are transcribed along the lines of [↓ŋ̊Ʞh].
Glottal egressive airstream
This airstream mechanism is perhaps the second-most common type to occur in human languages. For one, the most common sound we would associate with this airstream is the glottal stop, usually transcribed using this International Phonetic Alphabet (IPA) symbol [ʔ], or using letters such as ‘, in languages such as Samoan (in words like fai’ai ‘brain).
The glottis is referred to as the space between the vocal cords, and when the glottis closes, it creates an obstruction of airflow at that part in the human airway, stopping or greatly muting any vibration that occurs there. This obstruction and subsequent cessation of airflow is why this sound is referred to as a glottal stop.
The glottal egressive airstream is created by first lowering the glottis, closing the glottis to prevent backflow of air over it, and raising the glottis. This process creates positive pressure in the upper windpipe and the oral cavity. This category of sounds are also referred to as ejective sounds. As the glottis needs to be closed during this process, pretty much all ejective consonants are voiceless, sounding like a weird ‘k’, ‘t’, or ‘p’, rather than ‘g’, ‘d’, or ‘b’. Intuitively, as the production of egressive glottal sounds involves the formation of a glottal stop, the ejective sounds are commonly transcribed as [k’], [t’], and [p’], for example, and they are pretty much the same in the IPA.
Behind egressive pulmonic sounds, the egressive glottal sounds are perhaps the second-most common type of sounds to occur in all human languages, being found in around 20% of all natural languages according to linguist Peter Ladfoged. While common in languages of the Caucasus, various languages in the Americas, Africa, and even in some Austronesian languages, the ejectives are not found as a phoneme in any Indo-European language.
Glottal ingressive airstream
Just like the pulmonic counterparts, to produce ingressive glottal sounds, you essentially need to do the process of the glottal egressive airstream, but in reverse. The key defining feature of this airstream mechanism is the creation of negative pressure or suction in the upper windpipe and the oral cavity, and the glottis is not necessarily closed in most cases. Instead, the glottis is slightly open, and moves downwards to create airflow. Take note that unlike pulmonic airstreams, the air does not necessarily move, but more rather, the initiator.
This class of sounds is also called the implosive sounds, and are normally voiced. This makes the majority of implosive sounds sounding like a weird ‘b’, ‘d’, or ‘g’ in contrast to ‘p’, ‘t’, or ‘k’. In the IPA, this class of sounds are indicated using a right-facing hook, like [ɓ]. Such implosive sounds are commonly associated with African languages, and form perhaps the third-most common type of airstream mechanism. Bantu languages like Swahili and IsiXhosa have been documented to feature this class of consonants. Some Austronesian languages like Cia-Cia (as we have covered some time ago) also have implosive consonants like [ɓ] and [ɗ], as with some languages in South and Central America, and India.
This is not to say that voiceless implosives do not exist, however. Though exceedingly rare, instances of voiceless implosives like [ɓ̥] have been found in languages such as Kaqchikel spoken in Guatemala. Further examples of these are more tenuous at best, or are relatively unheard of in the world’s natural languages.
The ‘click’ consonants (Lingual ingressive airstream)
The fourth-most commonly occurring airstream, and perhaps the most interesting class of sounds, are the sounds we call ‘click’ consonants. This airstream is primarily created by the tongue (and hence, lingual), forming small air pockets that would be used in sound production. To do this, the tongue has to create two points of closure to make the air pocket. It usually occurs as a closure at the back of the tongue (also called a velar or uvular stop), and a closure at the front or side of the tongue, or even the lips (bilabial click).

This is then followed by some form of lowering of the body of the tongue to create some lower pressure in the air pocket. Releasing this would thus create some small airflow, even smaller than the ones you would see in the glottal counterpart. This usually takes place by opening at the front of the tongue (or lips) first, and further modifications to the quality of the click is done by accompanying the release, or opening, at the back of the tongue. This makes it possible for click consonants to be combined with sounds using other airstream mechanisms, or even multiple articulations, making for a rather intriguing set of click consonant inventories in certain languages.
Today, all click consonants can only be found in Africa, though the now-extinct or dormant Damin language in Australia has its own set of click consonants. Among the African languages, the Khoisan languages are most notable for having members with a rather extensive inventory of click consonants, such as the Taa language. Beyond the Khoisan languages, are African languages with a less extensive set of click consonants. Bantu examples include IsiXhosa and IsiZulu, which have borrowed such sounds from the Khoisan languages. Beyond southern Africa, however, are just three known languages with an inventory of click consonants. The language isolates Sandawe and Hadza, and Cushitic language Dahalo are the only known languages outside of southern Africa that use click consonants.
This is not to say that we non-Khoisan speakers are not capable of producing nor use any click consonants. We do, but these sounds do not encode any word we would typically use in speech. Instead, these are involved in paralingual communication — communication that conveys some meaning, but do not constitute standard speech. Things like calling a horse, or expressing disapproval or annoyance at something may involve the production of a click, which involves this particular airstream mechanism.
Percussive sounds?
Unlike the other airstream mechanisms we have and are looking at today, there are some sounds that do not involve any airflow mechanism at all. Instead, this occurs by striking one speech organ with or on another. Sounds like these occur incredibly rarely in human languages with non-impaired speech, with the only one of its kind documented in the Sandawe language of Tanzania. It is an allophone though, meaning that it is a different pronunciation of the same phoneme, and using either sound does not change the meaning of the phrase or word that is spoken. Largely transcribed as [¡], and described as a ‘tongue slap’, this percussive consonant in Sandawe is an allophone of this series of click consonants called the ‘alveolar clicks’, which involves a particular place of articulation which we will cover later.
Other airstream mechanisms
You may have noticed that we have not really touched on every single airstream mechanism here. That is because these mechanisms discussed are found in the human lexicon, while the remaining ones are not exactly heard of in any known languages today, or are more applicable to situations where the larynx is absent or unable to function.
Lingual egressive airstream
If you want to make a lingual egressive airstream, you sort of need to do what is needed for a click consonant, but in reverse. The tongue is involved in creating an air pocket that creates the pressure needed for an egressive sound, and is further aided by other mouth parts such as the cheeks. Its application is pretty limited though, as it is thought that such an air pocket would not be sufficient to create sounds requiring a stronger or more sustained airflow.
The Damin language, once again, is perhaps the only known language (albeit a ritual one) which features an egressive click consonant, transcribed as p’ or [ʘ↑]. This makes Damin the only language to feature a total of 5 airstream mechanisms, which is basically everything we have mentioned above, except the glottal ingressive airstream mechanism. Damin was used by and transmitted to a very restricted group of Lardil people, particularly Lardil men who have been ‘advance initiated’, or Demiinkurlda (Damin possessors) through warama, and was primarily used in ceremonies or rituals. And with the decline of warama ceremonies, so too declined the Damin language.
Alaryngeal airstreams
Lastly, for people who have undergone laryngectomy, there are other airstream mechanisms meant to replace those we have discussed. Without a larynx, and by extension a glottis, speech therapy would entail training other parts in and around the upper airway to create what could be used as a replacement glottis. These airstream mechanisms are thus alaryngeal, as they do not involve the use of the larynx.
One of the main organs involved in this process is the oesophagus, which is where food and liquids enter from the mouth, and into the stomach. In conventional speech therapy, the oesophagus would be trained to control pressurised airflow, and vibrate like the vocal cords would. This is called oesophageal speech, and is given a special encoding character Œ. Sometimes, this would also involve the puncturing between the windpipe and the oesophagus, where a valve would be inserted to aid speech, or tracheo-esophageal speech, encoded as Ю.
The second of which is the pharynx, the area between the nasal and oral cavity, and the oesophagus and the larynx. Instead of the oesophagus, here, the pharynx would be involved in creating the glottis, along with other parts such as the tongue and the palate. This airstream mechanism creates several more drawbacks compared to oesophagal speech, as the use of important organs such as the tongue and palate would make certain places of articulation more difficult or impossible, as well as involving substantially more effort to produce sounds at all. As such, in alaryngeal speech in patients undergoing tracheotomy (surgery of the windpipe) or laryngectomy, the oesophagus is perhaps the most widespread alternative for the glottis.
So today, we have looked at just one of the three processes involved in human speech production. It has been a rather lengthy one, but I hope I have touched on the most important ones, explaining much of the jargon, and literally Made Simple. Doing my own research for this topic has been quite an enjoyable one for me, as I have learned about more unusual languages that incorporate as many as 4 or 5 of these airstreams in their phonological inventories. In the coming weeks, we will touch on the other processes involved in human speech production, which are articulation and phonation.
Further reading
Diedrich, W.M. (1968), The Mechanism Of Esophageal Speech. Annals of the New York Academy of Sciences, 155: 303-317. https://doi.org/10.1111/j.1749-6632.1968.tb56776.x.
Hale, K.; Nash, D. (1997). “Lardil and Damin Phonotactics”. In Tryon, Darrell; Walsh, Michael (eds.). Boundary Rider: Essays in Honour of Geoffrey O’Grady, pp. 247-259. doi:10.15144/PL-C136.
Ladefoged, P. & Maddieson, I. (1996). The Sounds of the World’s Languages. Oxford: Blackwell. ISBN 0-631-19815-6.
Ladefoged, P. (2005), Vowels and Consonants (Second ed.), Blackwell, ISBN 0-631-21411-9
Proctor, M.I., Zhu, Y., Lammert, A.C., Toutios, A., Sands, B.E., & Narayanan, S.S. (2014). Articulatory coordination in Nama click consonants.
Weinberg, B. & Westinghouse, J. (1973), A Study of Pharyngeal Speech. Journal of Speech and Hearing Disorders, 38(1): 111-118. https://doi.org/10.1044/jshd.3801.111.