Wednesday 14 April 2010

Algebra of Functions, Part 1: Monoids

Composability

Our flagship product is SQL Server backed, and initially used a text-based SQL builder. It was robust, sturdy and reliable, though sometimes quite difficult to extend when a new query building requirement materialised. Other times, it seemed virtually impossible!

So we developed another, object-based SQL builder. This could manipulate queries at the level of individual terms, expressions, clauses, or entire composite superqueries. Essentially, we anticipated many features of LINQ to objects, and a small subset of LINQ to SQL.

The nontrivial overheads of maintaining such a code base, in the face of finite development resources, as well as new and unforeseen requirements, were soon apparent. More seriously, as it was in a sense retrofitted to an already extensive application architecture, we didn't quite derive all the anticipated benefits of composable queries, closure, and the other great things that a well formed algebra brings.

Those benefits are considerable; the move to so-called functional programming technique is comparable to the jump from a simple imperative procedural language to one using objects. Working in C#, the hero of functional programming style is the humble Lambda Expression.

Composition of Functions

As C# 3.0 developers we are by now familiar with lambda expressions; fairly arbitrary fragments of code that takes any number of input parameters (including none), and may return a value. They are related to C# 2.0 anonymous methods and C# 1.0 delegates in obvious ways.

Lambdas allow us to represent the notion of functional composition succinctly and generally. For example, suppose we wish to work out the Root Mean Square, or RMS, value of a sequence. This involves three steps:
  1. square each value in the sequence;
  2. compute the mean of those squares;
  3. extract the square root of that mean.
These three "atomic" functions can easily be composed to obtain the answer, the root of the mean of the squares:
static IEnumerable<double> Square(IEnumerable<double> values)
{
return from x in values select x*x;
}

static double Mean(IEnumerable<double> values)
{
return values.Average();
}

static double Root(double value)
{
return Math.Sqrt(value);
}

static double RMS(IEnumerable<double> values)
{
return Root(Mean(Square(values)));
}

static void Main()
{
Console.WriteLine(RMS(new[] {1.0, 2.0, 3.0}));
// Displays 2.16024689946929
Console.ReadKey();
}
In functional programming, we generalise the construction of composite expressions, like Root(Mean(Square(values))), using an operation that takes functions pairwise, and returns their composition as a new function. This lets us build new functions like RMS at run time, invoke them directly, pass them around to other functions, and so on.

With type safety in mind, we must accommodate functions with arbitrary parameter signatures and return types. Notice that my "atomic" functions Square, Mean and Root above all differ in this regard: Square projects a sequence to a sequence; Mean aggregates a sequence to a double; Root converts double to double.

The magic is done in the After extension method below:
static Func<T, V> After<T, U, V>(this Func<U, V> f, Func<T, U> g)
{
return x => f(g(x));
}

static double RMS(IEnumerable<double> values)
{
Func<IEnumerable<double>, IEnumerable<double>>
square = x => x.Select(y => y*y);
Func<IEnumerable<double>, double> mean = x => x.Average();
Func<double, double> root = Math.Sqrt;
var rms = root.After(mean.After(square));
return rms(values);
}

static void Main()
{
Console.WriteLine(RMS(new[] {1.0, 2.0, 3.0}));
// Displays 2.16024689946929
Console.ReadKey();
}
After takes two delegates f and g representing functions (i.e., value-returning methods accepting one input parameter), and combines these into a single delegate, representing the function whose effect is the same as applying the original functions in the sequence "f after g" (i.e., first apply g to the input parameter, and then apply f to that result). It is used twice in the above code; first to compose square with mean, then to compose that result with root.

What if I want to vary the order of composition? Suppose I need to know the square of the mean of the roots of my sequence. For simplicity - literally, to avoid complex numbers! - let's assume the input sequence comprises only nonnegative values. Can I use the After method to compose my atomic functions in the new order, generating the Square Mean Root (or "SMR") function?
static IEnumerable<double> SMR(IEnumerable<double> values)
{
Func<IEnumerable<double>, IEnumerable<double>>
square = x => x.Select(y => y*y);
Func<IEnumerable<double>, double> mean = x => x.Average();
Func<double, double> root = Math.Sqrt;
var smr = square.After(mean.After(root));
return smr(values);
}
Sadly no; all the joints are mismatched. Root expects a single input value, not a sequence. It also delivers a single output value, but Mean expects a sequence. And so on.

Maybe it would help if all the inputs and outputs were of the same type. We're only dealing with double precision floating point values, and sequences of these. So, let's replace all single value contexts with sequences containing just a single value:
static Func<T, T> After<T>(this Func<T, T> f, Func<T, T> g)
{
return x => f(g(x));
}

static IEnumerable<double> SMR(IEnumerable<double> values)
{
Func<IEnumerable<double>, IEnumerable<double>>
square = x => x.Select(y => y*y),
mean = x => new[] {x.Average()},
root = x => x.Select(Math.Sqrt);
var smr = square.After(mean.After(root));
return smr(values);
}

static void Main()
{
Console.WriteLine(SMR(new[] {1.0, 2.0, 3.0}).First());
// Displays 1.91016758060559
Console.ReadKey();
}
That's one possible approach. Note the simplification achieved in the After method. Now that all the atomic functions have the same input and output type, so will all compositions of these; we have a closed algebra.

In fact we have invented a Monoid.

Actually it's a bit too restricting to insist that all composable functions have the same input and output type. We would still be able to compose our atomic functional operations if their result type was different from the input, but we knew how to convert from one to the other. Next time we'll take a look at the Monads, and see how very surprisingly far this chaining idea can take us.

Future Builder

For any (purely hypothetical) future projects, we are unlikely to need a SQL builder design at all. Whatever our architectural object basis might be, let's say it uses SkinnyXmlObjects, we can derive SQL using some combination of LINQ to XML, LINQ to objects, and LINQ to SQL, with perhaps just a custom Skinny provider to handle Skinny specific details. But in fact we will probably bypass custom code entirely, and use a higher level, third party, persistence framework. Either way, the inherent complexity of the underlying mapping operations is certain to be managed by functional techniques.

No comments:

Post a Comment