Three tips on using Google’s new ngrams word search.
- <It is case sensitive. “beatles” is not the same as “Beatles”.
- There are multiple data sets. Are you using American English or British English?
- Check the data. See those links at the bottom (the ones with dates)? You can use those to see what is actually getting found in your search. Curious why “shaq” shows so many results in 1860? Look at the results for that time period. Turns out a lot are written “shaQ” which should be a clue that Google’s OCR engine misread books which were writing “shall.” (Also, you should have written “Shaq” not “shaq” — see #1.)
Finally, keep the following in mind when you interpret your data: ;
- Remember that this is a very incomplete data set. Google’s scanned a lot of books, but it is still only a small portion of what has been written in English. The other languages are likely even less reliable.
- Remember that this is an imperfect data set. I’ve found Latin books in the English data set, OCR errors, etc.
BONUS: For a little fun, try searching for “never gonna give you up” in #ngrams. (Doesn’t always work outside the US.)