Two of my hobbies are ballroom dancing and collecting motion picture and television soundtracks. I've taken lessons on the former for years, resulting in more enthusiasm than skill on my part. As for the latter, I have around 2000 CDs and albums cluttering up the palatial Overclock estate, along with all the computers, books, firearms, and notes on conspiracy theories.
Not too long ago, I was cruising around on Amazon.com looking for any classical music CDs that featured the tango Por Una Cabeza, which was featured in the movie True Lies. Thirty seconds of Googling for "true lies tango" had gotten me the title. When I typed that into Amazon's search engine, I got back several classical music CDs, and the soundtrack to True Lies.
On the surface, this might not seem unusual. Except that Por Una Cabeza does not appear on the soundtrack by Brad Fiedel. I knew this already because I owned soundtrack CD. There was absolutely no reference to the tango on the CD, in the liner notes, etc.
How did Amazon.com know that that tango was used in the movie? I don't think it did. I think the search engine merely noticed that people who searched for "Por Una Cabeza" also purchased the soundtrack to True Lies. So in a marketing move that seemed almost omniscient, and without really knowing why, it suggested that CD too. This is not the first time Amazon.com has startled me. (Actually, it startles me every month when my Visa bill arrives. One-click-ordering maybe works a little too well.)
This is a great example of what James Surowiecki talks about in his book The Wisdom of Crowds, where you leverage the combined knowledge of a very large sample of people, including both experts and amateurs. While a particular individual's knowledge may not add up to much, you get enough data points, start doing some correlations, or apply it all in a market-based system, and suddenly you not only have a lot of data, but some real information as well.
If you want some examples of the really counter-intuitive results you may get when you start looking at statistical correlations in large data sets, see Steven Levitt and Stephen Dubner's book Freakonomics. The controversial result that is mentioned in all the reviews is that the reduction in the crime rate correlates to the legalization of abortion. Unwanted babies that are not born do not grow up to become criminals. But this is merely one among many compelling chapters.
Peter Morville is a librarian, a librarian's librarian. He's the kind of librarian that might be (and probably has been) hired as a consultant by the Library of Congress or maybe Google. Morville invented the field of information architecture. His book Ambient Findability is about life in a world in which virtually any fact might be at your fingertips, if you only knew where to look for it, how to ask for it, and if it were organized to maximize its findability.
One of the topics Morville covers is the difference between organizing information in formal taxonomies versus the emerging folksonomies. Formal taxonomies are very expensive to create because they are labor intensive, but you are leveraging the wisdom, experience, and knowledge of highly trained experts in a field. The Library of Congress system of classifying books is a formal taxonomy. All those Latin words you had to learn in high-school biology were probably part of a formal taxonomy as well. Folksonomies may be very expensive to create as well, but the cost is widely distributed and the labor frequently donated, but you're leveraging a lot of people whose credentials are more or less unknown. Folksonomies abound on the web, ranging from del.icio.us (which is completely driven by users adding their own bookmarks categorized by tags that they themselves apply), to SlashDot (whose value-add is its editorial moderation as applied to the firehose of contributed news items), to the kind of information organization done by Amazon.com (in which software extracts patterns of behavior from a huge sample size with little or no human intervention).
It is this difference in economics between the formal taxonomy and the folksonomy that killed the original Yahoo, which the old coots in the audience may recall was once not a search engine but a formal taxonomy of web sites. It is also what made a star of Google, which is likely to become the emerging artificial intelligence that eventually conquers the planet.
I loved Malcolm Gladwell's book The Tipping Point, which applied theories of how an outbreak of disease becomes an epidemic to everything from fashion to open source software development. His more recent book Blink is about how sometimes snap judgements are more accurate than informed opinions formed by long studious research. The weakness in Blink is that Gladwell does not stress nearly enough that in all of his case studies of amazingly successful snap judgements, they were all made by the kinds of people you would want drawing up your formal taxonomies: experts in the field with decades of experience, whose brains contain finely honed neural networks capable of delivering useful snap judgements.
When the software refactoring folks talk about code smells, this is exactly what they are talking about: the sensation you get when you look at a piece of code, and something in the back of your mind tells you this looks funny. This is not an intuition that occurs to the average joe off the street. It is the result of many many hours spent finding that frackin' bug, and swearing, as God is my witness, I will never make this mistake again! There is nothing like a weekend spent at work to train your neural network to be more careful next time.
As useful as folksonomies obviously are, we need to keep a few experts around too for the value-add that their neural networks bring to the table. On the flip side, Surowiecki would tell us that it is just as bad having nothing but experts. True wisdom comes from having a wealth of experience, a breath of knowledge, and a diversity of opinions.
Malcolm Gladwell, Blink, Little, Brown, and Company, 2005
Malcolm Gladwell, The Tipping Point, Little, Brown, and Company, 2002
Steven Levitt, Stephen Dubner, Freakonomics, William Morrow, 2006
Peter Morville, Ambient Findability, O'Reilly, 2005
James Surowiecki, The Wisdom of Crowds, Doubleday, 2004