Wednesday, May 04, 2005

The Wikipedia Reliability Index

I had a brainstorm in the shower this morning. I'd been playing around adding information to Wikipedia. Wikipedia, as you may know, is a collaborative encyclopedia that literally anyone can update. The idea behind this "open-source" information model is that anyone can add or correct information, and that eventually bad information will be weeded out through a process of natural selection.

The problem with this, at least according to lots of folks, is that the instability of the information and systemic bias in the encylcopedia makes it unsuitable for use as a serious reference work.

This is a complex problem, and not one that can be easily solved. Some have even proposed a formal peer review process for articles. But there might be a way to increase the awareness of potential problems without the organizational overhead of trying to coordinate formal reviews of a constantly-changing work.

Here's my idea in a nutshell: every article in the Wikipedia should include a mathematically-calculated "reliability score" that gives the user a general idea of how likely the article is to be correct.

It would take some work to figure out the perfect formula for this. My initial thinking is that more page views and more edits should generally increase the score, since this indicates that the article has been subjected to a high level of scrutiny. A large number of recent edits (say, within the last 6 months), however, should reduce the score, since this tends to indicate that the article is in flux. Long edits (based on percentage of total article length) should, in general, also reduce the score more than "minor" edits, since they fundamentally change the content of the article.

It would also be possible to incorporate subjective criteria -- for example, number of sites linking to the article according to Google, and perhaps a rating system where users can vote on the accuracy of the information (along the lines of the "did you find this article helpful" links you often see in tech support databases.)

These factors would need to be incorporated in such a way that one person could not unduly influence the ranking.

It is possible that someone has already proposed this idea (if so, someone please point me in the right direction.) If not, consider it proposed!


