5 January 2011
Google Ngrams – the use of the word “landslide”
Posted by Dave Petley
Google now has about a million books online dating from 1500 to 2008. An interesting tool that they have provided, and which is great fun with which to play, is the NGram viewer, which allows the user to search for, and graph, the occurrence of specific words through time in these texts. Note that the data are presented as the use of the word as a percentage of the total words in the texts.
It is quite fun to compare the use of four different words describing natural hazards. This graph shows the occurrence of the words “earthquake”, “flood” and “volcano” from 1500 to 2008 in the “Google Million” set of English books:
The dataset starts very noisy and then becomes more stable, with a substantial increase in the use of the terms from about 1750 onwards. So lets focus on the period 1800 to 2008:
Notice how the use of the word “earthquake” has some large spikes, whereas “volcano” is smoother. I don’t know enough about the history of these terms and events to be able to interpret this, but compare the above with the same data for the use of the word landslide:
The pattern is completely different (and note that the axes scale is also rather different). The use of the word landslide started to increase from 1800 onwards, and appears to have risen essentially monotonically thereafter. There is a slight dip in the period from 1940 to 1950 (the effect of World War 2?), and the data for the term has become more noisy in recent years. This could of course be a reflection of the use of the term in the context of elections (i.e. “landslide victory” and suchlike), but a search for that specific term suggests not:
Finally, compare the term “landslide” with the older term “landslip”:
It is clear that the latter term was much more common than landslide through to about 1900, and that as the use of the term “landslide” became more frequent the use of “landslip” reduced markedly.
I’ll come back to Google Ngramsin a subsequent post to look at the use of landslide terminology in more detail. It is a pretty cool tool.
The three most prominent “earthquake” spikes appear to occur at 1923, 1906 and 1883. I suspect these spikes are the Kanto earthquake, the San Francisco earthquake and the eruption of Krakatau, which was a “really big jolt” and might have been considered initially as an earthquake, rather than as a volcano. It would be interesting to go back and look at what the newspapers of the time were saying about it.
http://ngrams.googlelabs.com/graph?content=propinquity%2C+adjacency&year_start=1800&year_end=2000&corpus=0&smoothing=3
Yeah, it’s cool !