In this spoon-fed Internutopia, there will always be reasons to prefer working things out for yourself: personal satisfaction, and the confidence and ability to function without an umbilical cord. Even if it's just high school algebra...

A colleague recently asked me what formula is used by Excel to calculate the parameters of a trend line. Without checking the documentation, I answered that Gauss' method of least squares might be an obvious choice. "So, how do I work out the coefficients used?" Well, on this occasion, we can just use a search engine, and harvest the necessary expressions from any one of approximately half a million suitable websites. Wolfram, perhaps. Or Wikipedia. Or http://www.curvefit.com/linear_regression.htm.

It wasn't always so easy. A mere quarter of a century ago, you might have found yourself alone in a Portakabin™ sitting by the exhaust of a running aero engine - in the middle of a field full of cows! - taking readings from an array of temperature, pressure and flow instruments that you'd chained in series to the user port on a Commodore Pet computer, and trying to get the attached HP pen plotter to show the associated trend curve. Which, the local aero boffins would have assured you, was theoretically likely to take the form of a quintic, or fifth-order polynomial.

These guys were serious about their fluid mechanics. Any of them could stir a cup of hot coffee vigorously with a teaspoon at 600 rpm, without spilling a drop. When they said the data would follow a quintic curve, you could bet the house on it.

But where to start? It seems reasonable to suppose that the straight-line fit can be generalised to this case, so let's look at that first. Plot all of the n data points (x,y) on a graph, and add a straight line approximation, y = mx + c, just wherever it looks good to you. Now for each point, measure how far away it is from your line. To simplify the sums, we usually measure these discrepancies vertically rather than perpendicular to the line, so we arrive at a list of ∆y values.

If you've fitted the line at all well, then some of these deltas will be positive, some negative. We do not want that! Positive and negative errors would cancel each other out! So, we square each delta, forcing all positive; then sum them all to get something we'll call ∑:

∑(∆y)² = ∑(mx + c - y)².

Finally, we do the mathematical equivalent of wiggling the line clockwise and anticlockwise (i.e. varying its gradient, m), while also shifting it up and down (varying its y-intercept, c), until we find the particular line that minimises ∑, the sum of all those squared deltas. To do this, we partially differentiate ∑ first with respect to m, then again with respect to c:

∂∑/∂m = 2∑(mx + c - y)x, which equals zero when

(∑x²)m + (∑x)c = (∑xy).

∂∑/∂c = 2∑(mx + c - y), which equals zero when

(∑x)m + (n)c = (∑y).

Notice that the bracketed terms ∑x, ∑y, ∑x², ∑xy, and of course n, are all known or readily calculated from our original data. So we have two simple simultaneous equations in our trend line coefficients m and c, from which we can readily obtain our linear regression, erm, line.

And that's all there is to it. Generalising the method to any alternative mathematical model, other than the first-order linear one, is usually a simple matter of varying this last step to "Adjust all available parameters of the curve simultaneously, to find the optimum fit." Whatever the mathematical model, the chances are excellent that we will be able to differentiate it as much as we need to, obtaining one additional simultaneous equation for each extra (independently adjustable) model parameter.

Polynomials like quadratics, cubics, and quartics, are particularly straightforward. And as I happily discovered one August day in 1984, alone in a Portakabin™ on East Rogerton Farm: it works equally well for quintics!

"If you've fitted the line at all well, then some of these deltas will be positive, some negative. We do not want that! Positive and negative errors would cancel each other out! So, we square each delta, forcing all positive; ..."

ReplyDeleteIt's worth noting that squaring not only forces all errors to be positive: It also exaggerates the larger error, which may or may not be desirable. L-1 regression, which minimizes least absolute errors, is less straightforward to compute, but is today widely available and quickly calculated by computer. See my article for further explanation:

L-1 Linear Regression

Will Dwinnell - That certainly is worth noting.

ReplyDeleteIn my defence, I did mention that vertical deltas are used "To simplify the sums," but of course that's also the historical reason for the squaring, which might in fact be quite an inappropriate weighting strategy!

Good article, btw.