Adam J. Calhoun has written a wonderful blog entry that illustrates, with some great data visualization, that it is possible to algorithmically distinguish different novelists based only on their punctuation habits.
The idea is simple: just remove all words from a corpus of text and look at the patterns of the punctuation. Here is an illustration.
I’ll now punctuate this blog entry with Calhoun’s description of the poignant differences in this illustration:
In fact, they can be quite distinct. Take my all-time favorite book, Absalom, Absalom!by William Faulkner. It is dense prose stuffed with parentheticals. When placed next to a novel with more simplified prose — Blood Meridian, by Cormac McCarthy — it is a stark difference (see above).
Calhoun has made his code freely available, so you can try this for yourself. Hmmm… I wonder if we can gain some insights into the different writing styles of different scientists this way?
(Tip of the hat to the folks at the Santa Fe Institute for bringing this story to our attention.)