In our lives, there is a certain tendency to describe and classify things, like sorting out the recycling. This need for classification also extends to academia as well, most notably in the fields of biology (more precisely, taxonomy), and linguistics. But within the fields that attempt to fit elements containing many nuances into discrete categories, we find two large opposing camps which have taken on some interesting names. These are the lumpers and the splitters, the former tending to focus on the broad similarities, while the latter tending to prioritise the differences to create more discrete groups.
These precise terms for the opposing camps date as far back to the mid 19th century, in biologist and naturalist circles. This was also right around when Darwin’s theory of evolution was materialising. Since its first mentions in the 1840s or the 1850s, depending on who you ask, the general definitions of each camp have been somewhat agreed upon. Lumpers emphasise the presence of significant similarities between elements, and hence tend towards fewer but larger groups. On the contrary, splitters emphasise the presence of differences between elements, and hence would tend towards more but smaller groups.
In linguistics, we see these lumper-splitter oppositions where language classification is concerned. Problems like what constitutes a language, and if a dialect continuum may be ‘lumped’ into a single overarching language or ‘split’ into several distinct languages, would normally attract such a debate. This goes even further as to organising several language branches into language families, which has attracted a fair share of controversies. Perhaps one of the earlier examples we have covered on this website is the so-called ‘Khoisan’ languages, which were formerly lumped due to the extensive use of click consonants, and their non-Bantu features. A more widely accepted classification favoured the splitter camp, which splits this overarching ‘Khoisan’ grouping into 3 small language families, the Khoe-Kwadi languages, the Kx’a languages, and the Tuu languages.
Perhaps one of the most famous lumpers is Joseph Greenberg, who proposed a controversial method in determining the genetic relatedness between languages, that is, mass comparison. To Greenberg, languages are related when they share many similarities in vocabulary, thereby forming a certain pattern that ties the group together. Where this starts getting sketchy is the fact that mass comparison only requires the ‘impressionistic feeling of similarity’, though what counts as a similarity, and how many similarities are required to prove relatedness, were not really defined by Greenberg himself. I am pretty sure you can see where this starts to fall apart as well.
One major drawback of mass comparison is the failure to account for borrowing. Without the consideration of systematic correspondences between languages, this method would be unable to distinguish between borrowed words and inherited words, which could lead to erroneous lumping. Additionally, resemblances may occur by pure chance, such as the word dog in English and Mbabaram, which, despite their similar forms, derive from separate etymologies altogether. Further convergences can also be explained by onomatopoeia and sound symbolism, such as nursery words ‘mama’ and ‘papa’. Words like these generally lend little evidence to comparative linguistics, since they are generally derived from infant vocalisations during language acquisition.
This method contrasts with the comparative method, a technique that splitters widely prefer in evaluating relationships between languages. Here, the comparative method compares the features of two or more languages, with a certain set of criteria that has to be fulfilled to suggest that the languages could have derived from a common ancestor language, the ‘proto-language’. Complementing this method is the proposal of sound changes which could explain the differences between the languages, and with these sound change patterns, a proto-language could be reconstructed. One of the most prominent projects applying this method, and lent towards the development of the comparative method is the rise of Indo-European studies, when there was a great interest in establishing the relationships between languages spoken in Europe, and later, Iran and the Indian subcontinent.
Nevertheless, Greenberg went on to lump languages together into larger families, such as the four language families of Africa he proposed, namely the Nilo-Saharan, Niger-Congo, Afroasiatic, and ‘Khoisan’ families. The Nilo-Saharan classification, however, is under dispute, owing to the immense diversity in the constituent families that were lumped by Greenberg. Some have criticised this classification as ‘Greenberg’s wastebasket’, as it was interpreted as a grouping of non-click languages which were not grouped as Niger-Congo nor Afroasiatic. Additionally, as previously mentioned, the ‘Khoisan’ family is rejected, and was instead split into 3 principle language families. Further lumping examples included the disproved Eurasiatic language family and the Amerind language family, with the proposed Altaic languages under dispute.
Today, the mass comparison method is mostly rejected by linguists, with more widely accepted methods being used to evaluate relationships between languages instead, such as the comparative method. These methods have reevaluated the lumping Greenberg has proposed, with some disputed, some rejected, and some accepted. Nevertheless, the divide still stands, as there is no fully objective method which could definitively establish the boundaries between language and dialect for the case of splitting into languages or lumping dialects in a dialect continuum, or lumping or splitting languages in larger language families. It is true that more evidence for or against the proposed classifications is needed, and that could be the one commonality that the lumpers and the splitters could potentially agree upon.