Thursday, January 28, 2016

Regressing Multiply?

"Multiple regression analysis" is a tool that statisticians will use to try to analyze relationships among pieces of data. It's supposed to account for the fact that the world is a pretty complex place and that studying it involves trying to juggle not just one variable, but several.

Solving an equation with one variable in it is relatively simple. Even though we usually don't think of it this way, even plain old arithmetic can be expressed with a variable: 2 + 2 = X, we know, provides us with 4. Algebra complicates matters -- of course -- by switching where the variable is in the relationship: 2 + X = 4. Still, it's not too difficult to solve these kinds of problems, and I'll thank you not to look at my report card all that closely when I say that.

Add a variable to the equation, and now you have serious algebra, as well as the possibility of multiple answers. If X + Y = 4, then we have a lot of possible solutions. Both of the terms can still be 2, of course, but now one of them could be 3 while the other is 1. One of them could also be 4, while the other is 0. And that's just with real whole numbers. Add negative numbers and fractions into the mix, and you can see that our two-variable problem will literally never run out of solutions.

Real-world problems, while they may not have infinite solutions, tend more towards that end of things than they do towards simple arithmetic. Which car is safest to drive, you may wonder. How will you find out? You could simply sort the number of fatal crashes by model of car and then compare the totals. The car with the fewest fatalities must be safer.

But is it? By simply sorting and counting fatalities, you have decided to ignore lots of other variables that may play a role, and according to psychology professor Richard Nisbett that means your analysis may be so flawed as to be useless. He uses the car safety study as his own example, pointing out that drivers with unsafe driving habits may gravitate towards certain automobile types and thus skew the results. If all the leadfoots (leadfeet?) suddenly switched to Volvos, that vehicle model's safety record might be quite different than it is. And if little old ladies started buying Dodge Challengers, their record might improve. Although you might have to select out the ones from Pasadena, at least when they are driving on Colorado Boulevard.

Professor Nisbett goes on at length, but if you've no desire to read his interview, the sum-up is that life is complicated and while we can guess what might happen based on what has already happened, it's still a guess and thinking otherwise is likely to turn out screwy. Even I can solve that one.

No comments: