Thursday, May 10, 2018

ngram Shenanigans

Despite growing concerns about the attention economy[1], I love two Google innovations with untainted passion: Google maps and the ngram.[2]

You see, since Google has all the books - like, seriously, all of them - they are in a unique position for cultural data mining.  The beauty of the ngram is that it exposes the capability.  The ngram allows users to test historical hypotheses about cultural trends by computing time series of word occurrence.

In other words, you can plot the relative-frequency of terms or vocabulary through two centuries of texts and then make wild speculations[3] about the cultural trends behind those plots. 

For example, vampires have always been more popular than zombies:

There was a brief, single decade spike in vampire interest in the mid 19th century…and a recent phenomena that blew them up…though I can’t imagine what that would be.

Zombies on the other hand are a twentieth century monster, rising up in the wake of each world war.[4]  But the thing I like about the Zombie plot is that it is slowly and steadily creeping up…you know…like a zombie.

In a future post[5] I will use these to test some linguistic hypotheses that Robert Nisbet made in 1953.  But this is not that post…this post is the confluence of the many interests explored in this space, and this analytical tool.

So now that you know how these work, here are a few more.


First, I often use this one when I explain my work at conferences or public meetings:

Dam building in Europe and North America peaked in the early 20-th century.[6]  But as they built at the best sites and our understanding of the ecological impacts grew, we stopped.  But dams also trap sediment, which fills the reservoirs.  So, about 80 years after the dam peak, interest in sediment follows a similar trend.


I know that the name “Stanford” peaked in the 20’s and the railroad magnate is in play here and the time series is confounded by the university…but I can’t parse the strands of those influences here:


I don’t feel like I need to add commentary here...


Also…without further comment.

…and rescaling contentment:


The ngram also weighs in on the religious illiteracy of our information institutions.  They use words like “evangelical” and “fundamentalist” as if they were interchangeable[7] and recent, with self-evident referents.  I’d suggest that any pundit who can’t explain the time series in this plot should probably educate themselves a little before the start throwing around religious labels:

Though, I suppose it would be fair to counter that many American Christians don’t know our heritage enough to parse these threads either.


All the major theological vocabulary are down. Church and worship follow the same trend on larger scales.
This should surprise no one, given secularizing trends and that devotional and theological texts composed a higher proportion of literature in the nineteenth century.  But a couple theological terms work against the trend, suggesting that certain emphases of twentieth century American Christianity are novel[8], and maybe not native to our belief system.


Speaking of innovations, I’d be lying if I said I expected this:


But this was pretty predictable[9]

…as was this[10]


… lightning round…


And finally…

[1] I may do a post on this…and baseball…but for now:  I’ve been influenced a lot by Simone Weil who advocated for dull activities in our educational process for two reasons:  1) learning to sustain attention* is far more important than acquiring content for children and young adults.**  Everything good comes from the compounding interest on sustained attention.  But, more importantly, 2) attention is the raw material of love, which makes it the active ingredient in relationships and worship. 

So, an industry that competes for attention is a particularly pernicious “tragedy of the commons,” monetizing our most important resource.  And yes, I have switched my phone to black and white permanently.  Try it for one day and see if it doesn’t divest that thing of some of its illicit power.

But this isn’t my technology lament…this is a celebration of some of the beauty that can also come from that ferment.

*my favorite Weil quote goes something like “attention is the purest form of generosity”
**and not-so-young adults if our own educational experience did not help us train our attention

[2] Both of these are underrated.  I realize it is difficult to believe that google maps are underrated…but adding historical maps makes maps an unprecedented tool for fluvial geomorphologists and forensic contaminant engineers (fields I work or have worked in).  But I cannot believe I haven’t seen more ngrams in TED talks or tweets or other forms of multi-sensory speech.

[3] These trends are confounded by many data artifacts, mostly sample size non-stationarity.  Random preservation bias and directional preservation bias affect the early record disproportionately.  I’d expect both variability and uncertainty to be high earlier in the time series and potentially skewed by surviving texts…and also texts Googlers* deemed worthy of pursuing.

*I have so many thoughts about this word since I just finished Lazlo Bock’s book.  But I’ll leave them on the table.

[4] Born out of the ashes of the expectation of monotonic progress and the false hope of pax-tencologica.  Zombies are dystopic in a way vampires aren’t, and often have a culture-scale morality-tale built into their narratives, while vampires feature individual-scale morality-tales.

[5] Which I can only promise because it started as part of this one and is complete.

[6] For the one or two folks who are hydro-minded…the joke that always works here in water crowds is something like…”the dam time series is shaped like a hydrograph, with a steep rising limb and a gradual receding limb…and the sediment time series is the same hydrograph, with an 80 year translation time.”  I know…I’m hilarious.

[7] And even religion non-specific.  “Fundamentalism” is a specific movement of early 20-th century American Christianity, but the religious categories of our information economy are such blunt instruments that it gets foisted on Muslims in the Middle East and Iran and Hindus in India.  

[8] Like Jazz and baseball dispensationalism is an American innovation and export.

[9] Happiness follows the same trend as joy, just steeper…but I find joy to be a much more useful idea, and frankly, was surprised at the happiness trend.

[10] “Being unable to cure death, wretchedness and ignorance, men have decided, in order to be happy, not to think about such things.”  Pascal - Penses