ANY list of the leading novelists of the 19th century, writing in English, would almost surely include Charles Dickens, Thomas Hardy, Herman Melville, Nathaniel Hawthorne and Mark Twain.
But they do not appear at the top of a list of the most influential writers of their time. Instead, a recent study has found, Jane Austen, author of “Pride and Prejudice, “ and Sir Walter Scott, the creator of “Ivanhoe,” had the greatest effect on other authors, in terms of writing style and themes.
These two were “the literary equivalent of Homo erectus, or, if you prefer, Adam and Eve,” Matthew L. Jockers wrote in research published last year. He based his conclusion on an analysis of 3,592 works published from 1780 to 1900. It was a lot of digging, and a computer did it.
The study, which involved statistical parsing and aggregation of thousands of novels, made other striking observations. For example, Austen’s works cluster tightly together in style and theme, while those of George Eliot (a k a Mary Ann Evans) range more broadly, and more closely resemble the patterns of male writers. Using similar criteria, Harriet Beecher Stowe was 20 years ahead of her time, said Mr. Jockers, whose research will soon be published in a book, “Macroanalysis: Digital Methods and Literary History” (University of Illinois Press).
These findings are hardly the last word. At this stage, this kind of digital analysis is mostly an intriguing sign that Big Data technology is steadily pushing beyond the Internet industry and scientific research into seemingly foreign fields like the social sciences and the humanities. The new tools of discovery provide a fresh look at culture, much as the microscope gave us a closer look at the subtleties of life and the telescope opened the way to faraway galaxies.
It is this ability to collect, measure and analyze data for meaningful insights that is the promise of Big Data technology. In the humanities and social sciences, the flood of new data comes from many sources including books scanned into digital form, Web sites, blog posts and social network communications.
Data-centric specialties are growing fast, giving rise to a new vocabulary. In political science, this quantitative analysis is called political methodology. In history, there is cliometrics, which applies econometrics to history. In literature, stylometry is the study of an author’s writing style, and these days it leans heavily on computing and statistical analysis. Culturomics is the umbrella term used to describe rigorous quantitative inquiries in the social sciences and humanities.
Quantitative tools in the humanities and the social sciences, as in other fields, are most powerful when they are controlled by an intelligent human. Experts with deep knowledge of a subject are needed to ask the right questions and to recognize the shortcomings of statistical models.
“You’ll always need both,” says Mr. Jockers, the literary quant. “But we’re at a moment now when there is much greater acceptance of these methods than in the past. There will come a time when this kind of analysis is just part of the tool kit in the humanities, as in every other discipline.” Read the full article on NYTimes.