Cultural Evolution Could Be Studied in #Google #Books #Data -base via @wiredscience

Amplify’d from www.wired.com

Google’s massive trove of scanned books could be useful for researchers studying the evolution of culture.

In a paper published December 16 in Science, researchers turned part of that vast textual corpus into a 500 billion-word database in which the frequency of words can be measured over time and space.

Their initial subjects of analysis, including cultural trajectories of popular modern thinkers and the conjugation of irregular verbs, hint at what might be done.

“There are many more questions, that we could never think of, that this data makes possible,” said Harvard University evolutionary dynamicist Jean-Michel Baptiste. “What we present in the paper is our first explorations of what becomes possible when you have this dataset.”

The new research is part of an emerging approach to applying rigorous statistical analyses, traditionally known from the study of biological evolution, to cultural evolution.

Unlike biological evolution, however, which can be studied through the fossil record and in genomic comparisons, cultural evolution has proved difficult to study.

Researchers have used archaeological documentation of Polynesian canoe shapes and records painstakingly assembled by comparative linguists, but rich and rigorously compiled datasets are rare.

One potential source is Google, which has scanned some 15 million books, or roughly 12 percent of every book ever published. Michel-Baptiste and his colleagues turned one-third of these, selected for legibility and fully documented origins, into a massive word database.

Patterns that can be queried from its cloud are not necessarily answers unto themselves, they say, but a way of illuminating subjects of further investigation.

“It’s not just an answer machine. It’s a question machine,” said study co-author Erez Lieberman-Aiden, a computational biologist at Harvard University. “Think of this as a hypothesis-generating machine.”

In the new study, the researchers restricted their queries to single words and names, as more sophisticated querying raised the potential of copyright violation. (Google and book publishers are currently negotiating terms of access to copyright material, putting scientific accessibility and legal restrictions at odds.)

See more at www.wired.com

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s