This post is mainly a very short personal summary of our recent PNAS paper “Climate, vocal folds, and tonal languages: Connecting the physiological and geographic dots” co-authored by Caleb Everett and Sean Roberts, with some input from them – although I am the only to blame for the opinions here expressed. Given the tone of the last posts, I’ll venture to share some thoughts on the situation of quantitative research in the language sciences that came (partially) as the result of the feedback we got from our publication.
Experimental and clinical literature on laryngology shows that desiccated air leads to a decrease in the precision of produced (and perceived) pitch. Furthermore, an extended exposure to such conditions might lead to a permanent insult to the articulatory apparatus under the form of, for instance, chronic sore throats. As a consequence, Caleb conjectured that populations embedded in dry climates would find hard to carry (or develop) a large number of lexical tones through time. After performing a series of global analyses, he contacted Sean and me and challenged us to prove him wrong, and we decided to turn to the Phonotactics Database of the Australian National University, which consists of 3756 languages, 629 having more than two tonemes . However, there were three big methodological aspects to overcome in order to turn a mere correlation into a robust effect.
First, regression methods -which dominate the field of quantitative analyses of language, nowadays under the form of mixed effect models- were not suitable (or ideal) for our purpose: we didn’t predict a tradeoff between presence of tonemes and humidity, but the absence of languages with complex tonal systems (CTL) in dry regions. Sean and I wrote a paper some time ago discussing the different formal classes of causal relations – unfortunately that hasn’t seen the light yet, but Sean wrote a blog post about that.
Second, languages (and peoples) of the world are concentrated around the Equator. This entails that, if you pick any feature not extremely common -like the inclusive/exclusive distinction in pronouns, the split ergative system or the prohibition on consonantal codas- you’ll likely find that it is represented by languages in warm and humid climates.
Third, languages that are genealogically linked or in contact tend to be more similar than completely unrelated languages.
Because of that, the Monte Carlo strategy implemented in the paper appeared as a natural way of making our point. We sampled the same number of languages (in a genealogically balanced manner, taking one language per linguistic family) for each of the conditions (CTL and nCTL) and then compared how specific humidity and temperature percentiles of these samples compared to each other – the expectation was that nCTL will be able to exist in drier environments, so the lower percentiles of those samples would be systematically lower than in the CTL case. This is exactly what we found: the samples of the later case have considerable larger lower percentile values for specific humidity and temperature. For the other extreme, our predictions are met as well – CTL have larger values for the upper percentiles of temperature (both high and low temperature are predictors of dryness) and they do not differ in terms of specific humidity for the upper percentiles of this variable (extremely humid regions are suitable for both CTL and nCTL). Similar results for the distribution of CTL and nCTL were obtained for areally controlled samples -although this didn’t make it to the paper- for language isolates and within language families covering diverse ecological landscapes.
A separate note deserves the discussion on the nature of the data we used or the details of the causal mechanism – my own take on that is that our results are harmonic with the previous literature on tone; dry conditions both impede regular tonogenesis to occur and sweep away CTL due to the maladaptive properties discussed. These questions are important and need to be addressed eventually -once more Sean wrote two blog posts discussing some of these things, here and here– and we appreciate the attention and the ideas people shared with us in the direction of taking this claim to the lab or using different data.
On the other hand, a few academics reacted towards our paper with certain disdain, without engaging into the discussion at all – in most or all of these cases they picked up the story from the media and they explicitly stated they haven’t read the article. Worryingly, many proceeded to discard altogether whatever scientific piece involving statistical data analysis as its main ingredient. These people tend to sympathise with each other by appealing to folk knowledge on stats, using lemmas like “correlation doesn’t imply causation” or some of the many popular jokes and anecdotes about how statistics could be used to mislead. Fair enough – against the usual practice, proper usage of statistics requires something more than just the off the shelf solution we could find in our favourite software or textbook, and there are many ways in which a well-intentioned analysis ends up with artifactual conclusions. Statistics is no silver bullet, but I think is fair to say about it the same thing Churchill said about democracy .
However, what strikes me the most is the Renaissance-esque attitude those academics defend when comes to discussing which is the valid manner of making and testing linguistic hypotheses. There’s no discussion about the fact that experts in language and languages are in an excellent position to come up with interesting ideas in the field, but it is not true that the usual linguistic training provides the proper tools to test hypotheses and detect patterns in well normalized data. Call me un-romantic, but the sober language of statistics, machine learning and quantitative data analysis in general is more akin to the spirit of scientific research than inspired intuitions or theory-dependent conundrums – which are important in their own way, of course. Linguists, being humans, are subject to a host of biases (as poignantly discussed by Edward Gibson and Evelina Fedorenko in their contemporary classic “Weak quantitative standards in linguistic research”) which can be spelled out more easily when quantitative methods are used instead.
My own diagnosis of the matter is that The Root Of All Evil lies in the inflated prominence theory has over empirical evidence in many branches of linguistics. Hopefully I’ll have the chance to extend myself on that topic sometime in the future.
Finally, and since I’ve been honoured with one of the first invited posts in this blog, I feel obliged to express what I think is a widely shared belief: this space was badly needed. I wish the curators all the best on this fabulous enterprise!