Fans of the run-on sentence, rejoice! There is some serious academic validation coming your way. Scientists from Long Island’s very own Stony Brook University have found that the more you use connective words like but, when, and since, the more likely you are to produce a blockbuster best-seller. Sound counterintuitive? Yeah, I kinda thought so, too. But with an algorithm that can identify big hits 84% of the time based on textual analysis, it’s hard to argue with the facts.
The data comes from a new study (PDF) published by the Association of Computational Linguistics that looked at more than 800 books across all genres by big-name established authors and little guys no one has ever heard of. The research team didn’t just look at relatively subjective criteria of literary quality like character development. Instead, they looked at word frequency and sentence structure to develop a scale that can predict what is going to be a best seller and what is going to fill up the 60% off bargain bin at Walmart. It’s a skill that most agents and publishers would kill for.
It’s interesting to note that in addition to looking at Pulitzer and Nobel winners, to determine the literary successes, they also used Amazon’s sales rankings to pick out some unpopular books by their low numbers. No word on how many self-published books garnered that dubious honor.
The results are pretty cool. The math was able to accurately predict success even for something as weirdly arbitrary as poetry, flagging the blockbusters 74% of the time. When fed into the algorithm, the computer accurately flagged Robinson Crusoe and A Tale of Two Cities as books likely to see market success. Dan Brown’s The Lost Symbol bucked the trend a bit. The algorithm gave it a thumbs down (as have most critics), but it has seen wild commercial popularity.
The researchers also developed a list of words that were frequently used in each type of book. Not surprisingly, poorly received books use a lot of basic descriptive nouns: boat, door, beach, body, face. They also used a lot of extreme words: never, always, absolutely, perfectly. Best-sellers seem to have more flowing prose filled with conjunctions and prepositions: and, which, though, whom, since, whenever, into, within, after. They use “thinking verbs” like remember and recognize instead of emotional action verbs like want, went, and took.
There are a few things to note about the work, though. Firstly, the study lumped together non-fiction books and fiction books into the same algorithm. On once hand, the team found that “highly successful books tend to bear closer resemblance to informative articles,” a prediction that really only holds true for non-fiction. On the other, they found an inverse relationship between “readability” and commercial success. James Joyce fans have got the right idea about what makes an enduringly popular book, apparently.
For authors, the real life lessons here are relatively slim pickings. Unless the researchers develop some plug-and-play software that lets you put in your first draft and come out with a million bucks or a big red REJECTED stamp, you’re just going to have to keep doing what you’ve always been doing: hoping that you don’t suck. It’s just fascinating to see what a little natural language processing and some hardcore analytics can come up with.