Poster sessions are strange beasts. Some posters present polished research that’s about to be published, some are hints of work to come, and some are one-conference-wonders, never to be heard from again. Often, though, they have some of the most interesting stuff at the conference. So, in honor of poster sessions, here are some thoughts on 5 not-randomly-chosen posters from the CUNY 2015 sentence processing conference. Specifically, they all have something or other to do with intersection of information theory, statistics, and language processing. (When I could find them online, I included a link to the poster or associated slides, papers, etc.)
Mark Myslín and Roger Levy
It’s been noted that (1) “The girl gave the boy the cookie” (DO construction) highlights a change in possession, whereas (2) “The girl gave the cookie to the boy” (PO construction) highlights a change in location. We also know that it’s better to put long NP’s at the end. (3) “The girl gave the cookie to the very hungry boy” is better than (4) “The girl gave the very hungry boy the cookie.”
Myslín and Levy’s idea is that, if a speaker uses the grammatically unlikely (4) in which the short-before-long preference is violated, then she must really have a reason for it. Maybe that reason is that she *really* wants to highlight the change in possession associated with the DO construction. On the other hand, in (1) and (2) (the sentences where both NP’s are short), the use of one NP over the other is less compelling evidence that the speaker is trying to convey either possession or a change of location. If comprehenders are good Bayesians, they’ll make that very inference.
It’s the same logic that might lead you to decide whether someone likes watching baseball. If she’s watching the World Series in a waiting room where there’s only one TV and no remote control, that’s not such good evidence that she’s a baseball fan. If she drives 3 hours to a baseball game and pays some ungodly amount of money for a ticket, then, hey, she probably likes baseball. If it’s worth the effort to say the grammatically weird (4), you probably did it for a reason and not just because you were stuck in the grammatical equivalent of the one-TV waiting room.
Myslín and Levy show that speakers make just these sorts of inferences when they’re presented with an alien language and are asked whether the sentence is most likely intended to convey a change in possession or location. It is also worth noting that the alien language used in this study is pretty fun: “The zarg prolted the cherid to a really gromious flig.”
Gabriel Doyle and Mike Frank
Speaking of the World Series…
There’s been lots of work showing that people like to keep information flow constant during communication. But how do you do test the effect of information that comes from non-linguistic context? Doyle and Frank found a truly cool way by collecting tweets during the World Series with the hashtag #worldseries. The idea is that the World Series causes a meaningful chunk of Twitter users to be talking about the same things at the same time and thus provides a reasonable proxy for common ground.*
It turns out that, as the game progresses, the per-word context-independent linguistic entropy goes up. That is, as common ground increases, the linguistic entropy increases. As the tweet rate goes up, though, (corresponding to interesting or important events in the game), the per-tweet information (approximated by the tweet length) goes down. Doyle & Frank explain this as another by-product of Uniform Information Density: as information in the world goes up (rare events are informative), the information per-tweet goes down. Question I wish I’d asked at the poster: when the tweet rate goes up due to an Important Event, wouldn’t that increase common ground? When nothing important is happening in the game, it seems like there would be more diversity in what #worldseries tweeters are discussing.
*Sure maybe not as much common ground as there used to be given the supposed decline of baseball’s popularity (see here for a counterargument to that claim), but I still like baseball–especially now that the Marlins have a real shot at the playoffs. If I were in a waiting room and a baseball game was on TV, I definitely wouldn’t change the channel.
Titus van der Malsburg and Bernhard Angele
Ok this isn’t really about informativity or information theory, but let’s say it’s about the informativity of our methods and call it on topic. In eye-tracking studies, it’s not always clear what type of effect to expect: first fix duration, gaze duration, go-past time, total viewing time. This work used simulations to show that, if you just check for an effect on any one of those four measures, you’re going to have an inflated chance of a false positive: 12% instead of 5%. To make matters worse, I’m sure the false positive rate gets cranked up if you start considering multiple regions, various exclusion criteria, and so on. More broadly, this poster is a nice example of how useful simulations are for seeing whether the analyses we’re doing are the analyses we think we’re doing.
Word forms–not just lengths–are optimized for efficient communication
Stephan Meylan & Tom Griffiths
Piantadosi, Tily, and Gibson (2011) showed that Zipf’s old finding about word length and frequency is actually better explained as a correlation between word length and predictability in context. The claim from Meylan & Griffiths is that it’s actually sub-lexical surprisal that is correlated with lexical surprisal. That is, phonotactic (or here orthographic) oddballs like “kvetch” occur in more surprising contexts than normal words like “cat.” Under this story, length is something of an epiphenomenon since length and sublexical surprisal are necessarily highly correlated (as you add letters, there’s more to predict so the overall probability of the string goes down).
So far the results are just for English, but they fit nicely with some cross-linguistic findings that Isabelle Dautriche, Steve Piantadosi, Ted Gibson, Anne Christophe, and I presented at AMLaP last fall (Lexical Clustering in Efficient Language Design). We showed that, across >100 languages from Wikipedia, there is a nice correlation between orthographic probability (trained on types) and token frequency.
It’s pretty awesome that one of the oldest observations in quantitative linguistics (the relationship between word length and frequency) continues to be a fruitful area for new research (see Steve Piantadosi’s summary of some of this work here).
Kyle Mahowald, Melissa Kline, Evelina Fedorenko, Ted Gibson
This is one of the posters I presented. First, we asked people to compress simple transitive sentences into 2 words: (1) “The policeman arrested the shopkeeper.” Sometimes the subject was predictable from the verb, sometimes the object was (2) “The engineer shuffled the cards”; and sometimes neither was (3) “The musician punched the plumber.” As we predicted, people almost always kept the verb. And they also tended to keep the *less predictable* argument. This is consistent with efficient compression–you want to keep the things that are hardest to recover. (See also Resnik (1996) for related work.)
We found a similar result in a different but related paradigm. In work she’s also been doing with kids, Melissa Kline presented people with simple scenes consisting of M (1-6) agents and N (1-6) objects. People had to describe the scenes using only 2 words. As the number of agents (M) went up relative to the number of objects (N), people were more likely to use the agent in the description. That’s presumably because, the more agents there are, any one agent is less predictable.
The story, as it often is: people are pretty good at using language in all sorts of efficient ways.
In addition to this crop, see also a write-up of some cool CUNY posters out of Florian Jaeger’s lab.