Fun with Google Books Ngram Viewer

When Google’s Ngram Viewer appeared on the scene last month, I (like half my Twitter feed) went a little crazy feeding search terms into it and looking at the results. For any of you who missed the news, the viewer lets you search language corpora generated from Google Books and display a chart of the frequencies of search terms displayed over time. Dan Cohen has a good post on it, in which he concludes (and I agree): “Digital humanities needs gateway drugs. Kudos to the pushers on the Google Books team.”

Given Google Books’ well-known metadata issues and the inevitability of OCR errors in the data, I wouldn’t use the Ngram Viewer to make any substantial claims yet.* I don’t think it’s reliable enough to use as evidence. But I do suspect this is the first incarnation of the kind of tool that we’ll be seeing a lot more of, hopefully in more sophisticated versions backed up by corpora with more reliable date and publication metadata, and with more features for access to the corpora themselves.

And in the meantime, I still find it well-nigh irresistible. Consider the first Ngram I generated, a chart comparing the usage of “scrapbook” and “commonplace book” in the 19th and early 20th centuries:

"scrapbook" vs. "commonplace book"


Thanks to my background research on the commonplace book project, I already knew that “scrapbooking” (as we now call it) took off in the mid-19th century and eventually became a more popular pastime than the keeping of commonplace books; the graph suggests something of the rise of one format and the fall of the other.

Or here’s another one for Dante Gabriel Rossetti and Christina Rossetti:

Christina Rossetti vs. Dante Gabriel Rossetti

I like how you can see Christina Rossetti’s reputation start to overshadow her brother’s in the early years of the 20th century, but they still track together throughout the subsequent decades.

So: it’s not a perfect tool, it’s not as transparent as one would like, and ideally it should lead to questions we don’t already know the answers to, like my commonplace vs. scrapbook search above—but I’ll be very curious to see what develops out of it in a few years’ time.

Incidentally, there’s now an Ngrams Tumblr blog, with an ever-growing collection of examples, some serious, some charmingly silly (like “new kids on the block” vs. “New Kids on the Block”: oh no, 80s flashback!).

* Geoffrey Nunberg, who wrote the piece on Google Books metadata that I linked above, also wrote a thoughtful article of the Ngram viewer and the kinds of scholarship it might lead to. Well worth reading, especially for its cautions about the limitations of the tools as they currently stand.

Comments are closed.