Tuesday 1 November 2011

Algebra of Functions, Part 4: Parsers

A Series of Tubes

Note: this 2010 series of articles (part one, two, three, ) was abandoned incomplete when overtaken by events at the Microsoft C# compiler factory. My utimate target, actually intended to be the article right after this one, was the Continuation Monad; but Eric Lippert got there before me with a surprise sprint (Continuation Passing Style Revisited, part one, two, three, four, five) leading up to his introduction to async/await marathon (Asynchrony in C# 5, part one, two, three, four, five, six, seven, eight). The only reason to publish this short excerpt now is to have a convenient index into the work of Brian McNamara and Luke Hoban, as linked at the end.

Today we take a look at another popular exhibit from the monad menagerie: the Parse monad.

Parsing is such a popular example of the application of monads, because it so neatly fits the characteristic template of a data processing pipeline with side effects. When parsing a string into any given language, each successive step
  1. consumes part of the input string, and then
  2. passes the remainder on to the next step in the sequence.
That constitutes the main process. But as a side effect, the just-parsed value is also made available in some way. This might be a keyword of the language, the name of a variable, or perhaps a binary value representing a data value of some particular type; but it could equally well be discardable whitespace, or any character, etc.

In terms of our monad then, the unamplified type is the input type string. The amplified type, which must be capable of holding all of the output from an operation of parsing, needs to store both a string, representing the remainder of the unconsumed input, and another, parsed value, of some arbitrary type.

Language geeks love all monads and parser combinators in equal measure, and there is a wealth of information about all this stuff on the web. As for C#, one of the best and most detailed I've seen is Brian McNamara's December 2007 series, "Monadic Parser Combinators", which gets as far as Part Eight before surrendering to F#:

Part: One, Two, Three, Four, Five, Six, Seven, Eight.
Source: Part 4, Part 5/6.

Very succinct and readable is Luke Hoban's contribution, also from 2007:

Monadic Parser Combinators using C# 3.0

No comments:

Post a Comment