Sunday, April 6, 2014

Know Everything, Understand Nothing

That, Tim Harford suggests in the Financial Times weekend magazine, is what so-called "Big Data" often amounts to, as he interviews economics professor David Spiegelhalter.

He notes the case of the Google Flu Trends project, which sought to predict flu outbreaks faster and more accurately than the Centers for Disease Control. At first, it did that, using the patterns of folks searching for information on flu and flu symptoms. Until it didn't: in 2013 Flu Trends predicted an outbreak nearly twice as severe as actually happened. The problem should be obvious. Google can easily count how many people search for information about the flu. But Google has no clue about why those people search for the information.

Of course the company can guess how many people meant to type "flue" because they wanted their chimneys cleaned or how many might be getting information for a not particularly well-documented report for class. They can even survey sample audiences to install this fudge factor into their algorithm and make better guesses. But they will always be guesses, because on the other end of the search window is a human being with unknown and, to the tabulators at Google, unknowable motivations.

Huge assemblies of information are only that. Without organizing principles or questions asked of them, they remain columns of numbers without significance. But different questions can elicit different patterns. We can see this most clearly (and pretty repulsively) when we watch politicians comment on events or new information. What to people of character may be a tragedy or disaster is to others ammunition for agenda advancement. In less disgusting circumstances, those with distinct opinions will comment on the exact same statistic in ways that mean entirely different things.

Human decisions introduce random elements to any amount of data that leave a gap between it and its meaning. Physicist Alan Lightman's essay "Smile," collected in 1996's Dance for Two, illustrates this by spending about a thousand words describing both the audio and visual processing done by ear, eye and brain when a man sees a particular woman standing on a dock. "All of this is known," Lightman writes after this detailed description. "What is not known is why, after about a minute, the man walks over to the woman and smiles."

The man's reason may be guessed, of course. He recognizes the woman as a friend. She may be a former classmate long unseen. She wears some collegiate or school garb he knows. She has spinach in her teeth. She's hot. We can guess, but we can't know without asking the fellow. Or checking to see if he is me, in which case our number of potential answers shrinks enough to make the guesswork a lot easier: She's Angie Harmon or Sutton Foster, my eyes are open and the part of my forebrain that asks me if this is really a good idea has been hit with a tranquilizer dart and put to bed.

People talking about issues use a phrase to warn against trying to use a single phenomenon as proof of their point: "The plural of 'anecdote' is not 'data.'" In a similar vein, the plural of "fact" is not "wisdom." Google it and see.

No comments:

Post a Comment