If you watch TV news, you may have found yourself wondering why they’re going on and on and on about Paris or Britney or Lindsey or whoever, instead of, like, news that you care about.
Russell Glasser, a student at U Texas at Austin, is currently writing a thesis on this subject. His basic approach was to data-mine both Google News and Digg, the idea being that the former gives a representative sample of what the news media are talking about, and the latter gives insight into what people are actually interested in. You can read more about this, as well as a draft of the thesis, here.
The basic idea is fairly simple: let’s say that in a given month, there are 99 news stories about Britney Spears and one about Tiger Woods; but on Digg, a lot of people recommend the one Tiger Woods story and ignore the 99 Britney Spears stories. It seems reasonable to conclude that the news-reading public cares a lot more about Tiger than about Britney.
And, in fact, his conclusion is that there’s a definite bias towards fluff in the media.
The whole thing is also an interesting exercise in teasing information out of noisy data: any given datum could easily be wrong: someone might Digg every story he reads; or a newspaper might run two Paris Hilton stories in one day because the news editor and entertainment editor didn’t realize that the other one was already covering the story. But if you have a lot of such noisy data (and nowadays, thanks to the Spew, we do), then you can still tease information out of it.