US constitutional scholars spent the better part of 140 years bickering over a dozen anonymously written essays. The 18th century papers argued in favour of ratifying the newly drafted American bill of rights, and researchers knew each essay was written by either Alexander Hamilton or James Madison, two of the country’s founding fathers; but for all their deliberations, the boffins could not agree on who authored which. Finally, in 1964, Time magazine triumphantly featured on its cover the two men who had solved the riddle: statisticians Frederick Mosteller and David Wallace developed a tool for forensic literary analysis, making it possible to identify an author’s elusive fingerprint and thus correctly attribute authorship. A somewhat unexpected guest, mathematics had entered the literary party.

Fifty years later, statistical literary analysis is grappling with an even trickier question: what are the qualities that make particular written works so popular? Writer Ben Blatt is not your typical literary critic. As an applied mathematics undergrad at Harvard, he started his career working for an American football team, conducting data analysis in search of hidden performance metrics to optimise. Curious about using  statistics as a prism for popular culture, he began writing for Slate magazine, where he analysed scripts to determine which of the characters in TV series Friends was the most popular (it was Rachel) and whether Chinese takeout really is more popular at New Year (evidence suggests that it is). His new book, Nabokov’s Favorite Word Is Mauve: What the Numbers Reveal about the Classics, Bestsellers, and Our Own Writing, applies statistical analysis to several of the greatest works of fiction, and comes up with insights on the making of a top-rated novel.

In a phone interview with Copylab, Blatt explained that what he found most appealing in this type of research is the sheer volume of data:  a single novel will consist of about 100,000 words, and when considering an author’s entire body of work, the data truly stacks up. “Many believe that placing maths and writing in the same sentence has no value. But a lot of the questions I look at are not simple or easy, and many have been debated by editors and writers for years.  It was pretty amazing to me to be able to offer evidence that was never presented before.”

Blatt is especially interested in testing writing advice, of which there was plenty on offer. Indeed, for many writers, it seems that no sooner have they completed a piece of writing, they feel compelled to suggest best practices. Ernest Hemingway dished out so much advice it was collected into a book. Stephen King didn’t wait for others to compile his tips, authoring his own manifesto.

Much of this advice makes for memorable quips (“There are three rules for writing a novel. Unfortunately, no one knows what they are.” – W. Somerset Maugham) and great chunks of it concern the mindset of the writer (“You never have to change anything you got up in the middle of the night to write” – Saul Bellow). Sometimes, though, the advice gets practical and highly particular. Mark Twain urged the writer to delete every instance of ‘very’. Kurt Vonnegut insisted on cutting out semicolons. F. Scott Fitzgerald said never to use an exclamation mark. But does any of this advice contribute to better writing? Does any of it even hold true for what we consider the literary canon?  Blatt wanted to find out.

For one thing, his research shows we like it when authors keep their writing simple. Blatt analysed each of the books on the NY Times’ bestseller lists from the 1960s up to the present day. Testing for length of sentences and their complexity, he discovered that both measures showed a steady decline. “In fact””, he adds, “the most complex book on a recent bestsellers list features simpler writing when compared to the simplest-written work on a 1960s bestsellers list”. Blatt initially found this observation disheartening, but has come to adopt a different view: “There is value in shorter sentences and a lot to be said for taking a step back to weigh words, asking whether they add value”.

But even as some aspects of writing evolve, others remain stubbornly in place. One of the quotes left out of the book was Charles’ Dickens note to George Elliott: having read a piece she’d written, Dickens informed her that even had he not known her, he would’ve believed the text to be written by a woman. One key quality differentiating male and female writers, Blatt discovered, is the role awarded to female characters. Counting ‘he’ and ‘she’ pronouns in the complete works of different authors, he noticed a heavy bias towards males. For swaths of the literary canon, female characters are little more than background fodder.

“99% of the singular pronouns are ‘he’ within Tolkien’s work”, says Blatt. “For Elmore Leonard, who often spoke on how important it was to feature more female characters, the percentage of female singular pronouns grew as his career progressed, but never rose above 25%”. When comparing this to female writers, Blatt found they tend to portray a far more balanced view; male characters retained 55%-65% of the references, but use of ‘she’ never fell below 40%.

Author Ben Blatt

Blatt considers this statistical measure akin to the Bechdel test, an indicator for the active presence of women in films. Blatt says this point struck him as important: “Because a lot of the literary curriculum consists of male writers, this is what girls, and boys, study in the classroom. I think it carries a negative implicit message.”

But for the teeth-gnashing author, will any of this advice make for better writing? The answer, as Blatts sees it, is ‘it depends’. He tests the advice against works of fiction, and cautions against extending results to cover all forms of writing: “There’s compelling evidence that the authors that hold best over time use the least amount of ‘ly’ adverbs. Hearing this, a friend told me she had stopped using them in her text messages and emails. This was definitely not the point I was trying to make! I was looking at a very specific question concerning novels, certainly not these types of text. So it’s very important to be clear about what this data can and cannot show you.”

He believes that what the data does show is that, above all, good writing is less a question of absolute quality, but a question of readership. Take clichés, for example. Blatt tested against The Dictionary of Clichés, to see how often different writers use those haggard phrases. And one finding to emerge is that best-selling author James Patterson uses two to three times the amount of clichés a Pulitzer-prize winning novelist typically would. “It’s not really about me being an arbiter of what’s good and what’s bad,” he says. “Clichés could be good, if what you want is a bestseller.”

Vered Zimmerman

Vered Zimmerman

Vered is an investment writer in our London office. She holds an MBA from Cass Business School and an MSc in mathematics from the Hebrew University in Jerusalem.
Vered Zimmerman