Thursday, October 1, 2009

Excel or Regress

A truly disruptive technology not only changes the way we behave but also the way that we think. I like to imagine technologies of the past that meet this standard, and ones that don’t. My guess is that the automobile itself was less of a disruptive technology than good roads. Being able to move in a car changes the way we behave, but not necessarily the way that we think. By the time that cars were fast enough, cheap enough and roads were good enough cars made it possible to go places that we would never, or rarely go by ourselves. Once that was possible people must have thought about home, time and space much differently. I think the motion picture was disruptive as were airplanes, the phonograph and countless other technologies. During my lifetime, I have seen many such changes from the PC to the internet to MP3 players and e-books. Still the one I want to talk about here is perhaps more accepted as mundane than it is thought of as disruptive. That is the Microsoft Program Excel.

Excel is not the first computer spreadsheet to be invented, and it is generally considered to be very good. Still it is not considered by most people to be disruptive in the same way as the previous inventions I mentioned. Nearly everyone in nearly every discipline uses Excel; however it is not just using it that makes it important. It is disruptive because it changes the way that we think. This may or may not be a positive societal shift. I had always considered Excel to be as close to a perfect program that was possible. It is not self organizing, but it is self solving. It is always a highly functional tool, or even tool box, which didn’t tell me what to build, but helped me build it. This is still true of course, and I am glad that Excel exists. I don’t know how I (others certainly have) would do anything in science or business without it. This leads me to its ability to change the way we think. Excel gives solutions, not proofs. I have grown up with this tool, and my mathematical instincts have suffered from it. For instance, one of the most important things that scientist do every day is compile data points, which can be for example the results of a number of experiments. When a number of experiments are run there is generally some noise (also known as scatter). This scatter can be interpreted in thousands of ways. The first thing the scientists needs to consider, is rather it is noise at all, or actually a difference in data points, which are not just a statically anomaly. Traditionally this was done by plotting these points by hand, choosing the type of graph that seems most valid from your experience of doing this before, then calculating a statistical fit which is most representative. If you had no knowledge of how to do regression analysis, you would plot the points and connect them all. This would give in essence equal importance to every point, when in all likelihood some were outliers and should not be considered with equal weight. There are a number of ways to achieve this, by doing the simplest linear fit to a very complicated polynomial.

All of this can be done with a click of the mouse button. This inevitably makes those of us who were not math geniuses comfortable not only with math, but also something far more complicated. It makes us think we understand the probability of the calculated results. It makes us believe that we can take complicated noise which seems to us random, and fit it onto a graph, where no matter what chaos is inherent in the results, we can fit it in a smooth and simple to read curve. The problem with this is that Excel is a language in itself. It is a language none of us truly whether we have a mathematical background or not. For mathematicians, the way in which formulas are written is not the same as the mathematician formulates theorems and proofs. It is a special code, that requires help menus, or memorization. This is in contrast to the last 2000 years of mathematical processes that have had an organic progression from Euclidian Geometry, to Cartesian Coordinates, through Newtonian Calculus to Born Matrices. Those were all different and unique, but followed from logic and necessity, and from an instinctual desire to create an understandable code to solve the next questions nature presented itself with. Excel is not a new form of mathematics. It is not even a simplification of current mathematics. Instead it is tools, which in the process of helping us speed up complicated regression analysis, also leads to a different type of regression. It is a regression of critical thinking. It forces on us a deliberately bizarre way of following a formula, only to ultimately give us the solution we want, whether it best represents reality or not.

This all seems rather harsh words for one of the best computer programs ever written, and as a standalone criticism it is. Excel is not a problem, it is instead a symptom. I am not the kind of scientist who can do extremely complicated proofs, and equations on my own. I need tools, so Excel has been a constant companion for me. I am only raising a challenge to myself, my colleagues and my students to step back into the dangerous waters of nature, and scientific experiment, and see if we really have the answers.

No comments: