Climate, languages and statistics in linguistics

This post is mainly a very short personal summary of our recent PNAS paper “Climate, vocal folds, and tonal languages: Connecting the physiological and geographic dots” co-authored by Caleb Everett and Sean Roberts, with some input from them – although I am the only to blame for the opinions here expressed. Given the tone of the last posts, I’ll venture to share some thoughts on the situation of quantitative research in the language sciences that came (partially) as the result of the feedback we got from our publication.

Experimental and clinical literature on laryngology shows that desiccated air leads to a decrease in the precision of produced (and perceived) pitch. Furthermore, an extended exposure to such conditions might lead to a permanent insult to the articulatory apparatus under the form of, for instance, chronic sore throats. As a consequence, Caleb  conjectured that populations embedded in dry climates would find hard to carry (or develop) a large number of lexical tones through time. After performing a series of global analyses, he contacted Sean and me and challenged us to prove him wrong, and we decided to turn to the Phonotactics Database of the Australian National University, which consists of 3756 languages, 629 having more than two tonemes . However, there were three big methodological aspects to overcome in order to turn a mere correlation into a robust effect.

First, regression methods -which dominate the field of quantitative analyses of language, nowadays under the form of mixed effect models- were not suitable (or ideal) for our purpose: we didn’t predict a tradeoff between presence of tonemes and humidity, but the absence of languages with complex tonal systems (CTL) in dry regions. Sean and I wrote a paper some time ago discussing the different formal classes of causal relations – unfortunately that hasn’t seen the light yet, but Sean wrote a blog post about that.

Second, languages (and peoples) of the world are concentrated around the Equator. This entails that, if you pick any feature not extremely common -like the inclusive/exclusive distinction in pronouns, the split ergative system or the prohibition on consonantal codas- you’ll likely find that it is  represented by languages in warm and humid climates.

Third, languages that are genealogically linked or in contact tend to be more similar than completely unrelated languages.

Because of that, the Monte Carlo strategy implemented in the paper appeared as a natural way of making our point. We sampled the same number of languages (in a genealogically balanced manner, taking one language per linguistic family) for each of the conditions (CTL and nCTL) and then compared how specific humidity and temperature percentiles of these samples compared to each other – the expectation was that nCTL will be able to exist in drier environments, so the lower percentiles of those samples would be systematically lower than in the CTL case. This is exactly what we found: the samples of the later case have considerable larger lower percentile values for specific humidity and temperature. For the other extreme, our predictions are met as well – CTL have larger values for the upper percentiles of temperature (both high and low temperature are predictors of dryness) and they do not differ in terms of specific humidity for the upper percentiles of this variable (extremely humid regions are suitable for both CTL and nCTL). Similar results for the distribution of CTL and nCTL were obtained for areally controlled samples -although this didn’t make it to the paper- for language isolates and within language families covering diverse ecological landscapes.

A separate note deserves the discussion on the nature of the data we used or the details of the causal mechanism – my own take on that is that our results are harmonic with the previous literature on tone; dry conditions both impede regular tonogenesis to occur and sweep away CTL due to the maladaptive properties discussed. These questions are important and need to be addressed eventually -once more Sean wrote two blog posts discussing some of these things, here and here– and we appreciate the attention and the ideas people shared with us in the direction of taking this claim to the lab or using different data.

On the other hand, a few academics reacted towards our paper with certain disdain, without engaging into the discussion at all – in most or all of these cases they picked up the story from the media and they explicitly stated they haven’t read the article. Worryingly, many proceeded to discard altogether whatever scientific piece involving statistical data analysis as its main ingredient. These people tend to sympathise with each other by appealing to folk knowledge on stats, using lemmas like “correlation doesn’t imply causation” or some of the many popular jokes and anecdotes about how statistics could be used to mislead. Fair enough – against the usual practice, proper usage of statistics requires something more than just the off the shelf solution we could find in our favourite software or textbook, and there are many ways in which a well-intentioned analysis ends up with artifactual conclusions. Statistics is no silver bullet, but I think is fair to say about it the same thing Churchill said about democracy .

However, what strikes me the most is the Renaissance-esque attitude those academics defend when comes to discussing which is the valid manner of making and testing linguistic hypotheses. There’s no discussion about the fact that experts in language and languages are in an excellent position to come up with interesting ideas in the field, but it is not true that the usual linguistic training provides the proper tools to test hypotheses and detect patterns in well normalized data. Call me un-romantic, but the sober language of statistics, machine learning and quantitative data analysis in general is more akin to the spirit of scientific research than inspired intuitions or theory-dependent conundrums – which are important in their own way, of course. Linguists, being humans, are subject to a host of biases (as poignantly discussed by Edward Gibson and Evelina Fedorenko in their contemporary classic “Weak quantitative standards in linguistic research”) which can be spelled out more easily when quantitative methods are used instead.

My own diagnosis of the matter is that The Root Of All Evil lies in the inflated prominence theory has over empirical evidence in many branches of linguistics. Hopefully I’ll have the chance to extend myself on that topic sometime in the future.

Finally, and since I’ve been honoured with one of the first invited posts in this blog, I feel obliged to express what I think is a widely shared belief: this space was badly needed. I wish the curators all the best on this fabulous enterprise!


  1. Hi Noah – thanks for your comment and your questions. The CDFs graphs were there just to illustrate the situation, but they have no statistical value since they are based on pooled data (also the ANU data are quite similar distributionally). While I agree the MC strategy we followed is not very usual, I guess it should be clear that there were no obvious simpler alternatives available (and as you note, methods from the regression family are not the way to go). The MC solution deal with all the issues I described in the post.

    I agree with your point about the presentation of the regression and MC analyses in the paper. Regression makes a different point, and it is not as convincing as the strategy laid out in the main body. We ended up including those due to sociological reasons: a reviewer wanted to see the results supported by a more traditional analysis at least somewhere in the paper. We should have included the intra-family MC analysis as well there, you are right – we kept thinking about what things we should have included/excluded even weeks after the publication of the paper.

    Re the other question. Indeed, one could have looked for the cut-off value by choosing the value that maximises some of the statistics we used in the paper. We decided to stick with the 3+ following the distinction introduced by Maddieson in the WALS. With respect to other uses of pitch in language and how they could be affected by the phenomenon hypothesised in the paper, please take a look at the second blog post by Sean referred in the post.

    Thanks again!

  2. Thanks for writing a post about the tone and humidity paper (and about quantitative analysis of language-related data in general). For what it’s worth, I liked that you all compared CDFs in the paper. I don’t have a general problem with the Monte Carlo simulations, but I didn’t find them particularly easy to interpret.

    In any case, I’m curious about a couple aspects of the paper, one directly related to this post, and one less so.

    With regard to the quantitative modeling, why did you all use such different methods for the analyses in the paper proper and the analyses in the supplementary materials? I’m thinking in particular about CDF comparisons (and Monte Carlo simulations) in the paper vs the family-based regression/correlation analyses in the supplement. It would be nice to see the same quantitative information from each of the two databases, but the CDFs are all from the WALS data and the regressions were carried out on the ANU data. I felt like the family-based analyses are really crucial to the whole case for the relationship in question, but I found the regression/correlation analyses less easy to interpret and less convincing than the comparisons of CDFs. If there is a relevant cutoff (or inflection point) at 3+ tones, I can’t think why there would be a linear relationship between humidity and number of tones.

    Less directly related to the post, I’m curious why the cutoff for “complex” tone was 3+ lexical tones. Do you know what the analyses look like if you shift that cutoff? Or if you look at languages with lexical tone at all vs non-tonal languages (we can see more or less what this looks like based on Figure 2 in the paper)? Does the production of complex (3+) lexical tones really require more precise laryngeal articulation than the production of “simple” lexical tones, pitch accent, lexical stress, and/or intonation contours?

Leave a Reply

Your email address will not be published. Required fields are marked *