## Thursday, 18 April 2019

### Expression Parser

Modus Operandi

Previously:
Differentiating f(x)^g(x)
The Differentiator
Graphing Derivatives
The Complexities Of The Simplifier
Formula Translation: The Error Function
The Differentiator's fluent expression-building syntax is great when you have source code access, but it's easy to build a standard, recursive-descent parser for everyone else to enjoy. As you'll see below, the best thing about that is the sheer pleasure of creating your own precedence and associativity rules.

Our little language of arithmetic expressions in a single real variable x has quite the simple grammar, which in mostly-EBNF, looks like this:
``````
Parameter       = "x" ;

Number          = ? \d*\.?\d*([eE][+-]?\d+)? ? ;

NamedConstant   = "e" | "π" | "pi" | "ϕ" | "phi" ;

UnaryOperator   = "+" | "-" | "!" | "~" | "√" ;

BinaryOperator  = "||" | "&&" | "|" | "&" | "=" | "==" | "≠" | "<>" | "!=" | "<" | ">" | "≮ " | "≯ " | "<=" | ">="
| "+" | "-" | "*" | "/" | "^" ;

Function        = "Abs" | "Acos" | "Acosh" | "Acot" | "Acoth" | "Acsc" | "Acsch" | "Asec" | "Asech" | "Asin"
| "Asinh" | "Atan" | "Atanh" | "Ceiling" | "Cos" | "Cosh" | "Cot" | "Coth" | "Csc" | "Csch"
| "Erf" | "Exp" | "Floor" | "Ln" | "Log10" | "Round" | "Sec" | "Sech" | "Sign" | "Sin"
| "Sinh" | "Sqrt" | "Step" | "Tan" | "Tanh" ;

Operand         = Parameter
| Number
| NamedConstant
| Operand, "'" (* Form the derivative *)
| "(", Expression, ")"
| Function, Operand
| UnaryOperator, Operand ;

Expression      = Operand, { BinaryOperator, Operand }
| Expression, "?", Expression, ":", Expression ;
``````

The mutual recursion visible between the above Expression and Operand nonterminals is the hallmark of a classic infix-notation, parentheses-laden syntax.

Notes on the Grammar
• The Parameter 'x' is case insensitive, but will be converted to lower case during the parsing process.
• The Number element is defined above using a regular expression pattern. This pattern overreaches, since for simplicity's sake, none of its character groups are mandatory. So the empty string is regarded as a valid number; as is, a single dot; or for example, the sequence '.E-'. But that's OK, as long as the tokeniser is just greedy enough, but doesn't steal characters from subsequent tokens. The actual parsing of constants will be done later by the double.Parse() method, so if the tokeniser passes through any such malformed number, the error will be detected.
• A Number starts with either a digit or a decimal point; any preceding '+' or '-' signs will be treated as unary operators (although the parser will later collapse all such operators into a single Constant node).
• The NamedConstant class does allow you to type the names of (some of) the few supported values as Greek letters (π, ϕ)... but you're wasting your time! This code collapses (combines) constants eagerly, and everything soon ends up as a double-precision, floating point number. NamedConstant elements are case insensitive.
• The UnaryOperator elements '!' (nothing to do with factorials) and '~' both represent a logical not operation. The number zero is regarded as false, anything else as true. So if x=0, then !x (or equivalently ~x) will return 1; otherwise 0.
• UnaryOperator '√' performs the same function as 'Sqrt'.
• The BinaryOperator elements '&' and '&&' both perform a logical And operation, likewise '|' and '||' a logical Or. They differ only in their levels of precedence: as in C#, the order of evaluation is '&', '|', '&&', '||' (obviously you can just stick to one preferred style, as the same effect can be achieved using parentheses). Similarly, '=' and '==' perform the same equality test; '≠', '<>' and '!=' the same inequality test; and so on for the comparison operators.
• The available Function terminal elements are just the names of the 35 functions made available in the previous article.
• Function names are case insensitive, but will be converted to Title Case (initial capital) during the parsing process. Notice that the Function syntax doesn't require parentheses; abs x works just as well as Abs(x).
• The apostrophe "'" is the only postfix operator in the grammar. It has the effect of differentiating the preceding Operand, so for example, (Sin(x))' evaluates to Cos(x).
Turning Japanese

It's always at this point, turning from the syntax diagram to the semantics of coding, that I inevitably get seduced by this cool feature, first encountered in my first couple of pocket computers: the Sharp PC-1211 and PC-1500. The cool feature is implied products. TL;DR: every memory category was in such short supply back then (see: millennium bug), even BASIC variables were meted out individually, and their names were A..Z. The unexpected benefit of this rationing was that products of variables (variables which individually could only have single-character names like A or B) could now be expressed unambiguously in the form AB, saving a whole byte by omitting the multiplication symbol.

At first glimpse this implied multiplication seems no big deal, but wait. Does it necessarily have the same associativity and precedence as normal multiplication? If so, then
A/BC would be parsed as (A/B)*C = (A*C)/B.
But this seems wrong! The expression looks so much more like a quantity A being divided by a quantity BC, in other words,
A/BC = A/(B*C).
Now the C has landed below the line; before, it was above. Which is correct?

Utility is monarch, and the solution turns out to be the introduction of a new kind of implied multiplication with a higher precedence than the usual one. By analogy: the associativity of '^' was once settled to be right, unlike its additive and multiplicative colleagues which all tended left.
So, while subtraction goes like this: A-B-C = (A-B)-C = A-(B+C),

by contrast exponentiation does this: A^B^C = A^(B^C)

rather than the much less useful (A^B)^C,
because the real action's in the exponent. In the case of the implicit multiplier, the new operator was given a higher precedence than its peers, so for example
A^BC = A^(B*C), rather than (A^B)*C.
In this way multiplication, if it just remain hidden, can leapfrog other more powerful operators, perhaps even unary operators, perhaps even function calls:
sin AB = sin (A * B), compared to sin A * B = (sin A) * B.

White Feather

So why did I hesitate to implement such an operator in Parser?

A recurring subject in the C# language designers' blogs and comment responses is cost. Nothing comes for free! There's always at least a test budget to consider, and almost always, myriad further hidden design and development costs. Implied products are almost too attractive a feature. Sure, (x+2)(x-2) and its ilk are compelling, but to get the full story, you must go back to the language's grammar definition and look at all newly emerging cases.

2 sin x cos x
for example. The familiar sine of a double angle formula, obviously and unambiguously intended to be read as
2*sin(x)*cos(x).
But should the precedence of "implicit multiply" exceed even unary and function calls, the interpretation becomes
2*sin(x*cos(x)),
which, simply, ouch.

Then again, why all this detail, if I never intended to implement such an operator? Because I knew I would. I knew that about myself, and so wanted all this background material to be available for review, when I finally revisited the code to smoosh it up a bit. Also, the very inevitability of that revisit is why I made this code as clean and readily understandable as I could. Part of this process is of course eliminating all but the most vital of comments from the source; hence their home in this article.

That Precedence enumeration was a proper godsend when smooshing time arrived! Anyway, here comes the parser code:
``````
namespace FormulaBuilder
{
using System;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
using System.Text.RegularExpressions;

public class Parser
{
private enum Precedence
{
RightParenthesis,
Multiplicative,
Exponential,
Functional,
ImpliedProduct,
SuperscriptPower
}

const char
SquareRoot = '√';

const string
ImpliedProduct = "i*",
SuperscriptPower = "s^",
UnaryMinus = "u-",
UnaryPlus = "u+";

private string Formula;
private int Index;
private Stack<Expression> Operands;
private Stack<string> Operators;

public Expression Parse(string formula)
{
Formula = formula;
Index = 0;
Operands = new Stack<Expression>();
Operators = new Stack<string>(new[] { "(" });
ParseExpression();
if (Operators.Any())
throw new FormatException(
\$"Unexpected end of expression, input='{Formula}'");
return Operands.Peek();
}

public bool TryParse(string formula, out object result)
{
try
{
result = Parse(formula);
return true;
}
catch (Exception e)
{
result = e.Message;
return false;
}
}

private static double GetNamedConstantValue(string constant)
{
switch (constant.ToLower())
{
case "e":
return Math.E;
case "π":
case "pi":
return Math.PI;
case "ϕ":
case "phi":
return (1 + Math.Sqrt(5)) / 2;
}
return 0;
}

private static ExpressionType GetExpressionType(string op)
{
switch (op)
{
case "+":
case "-":
return ExpressionType.Subtract;
case "*":
case "i*":
return ExpressionType.Multiply;
case "/":
return ExpressionType.Divide;
case "^":
case "s^":
return ExpressionType.Power;
case UnaryPlus:
return ExpressionType.UnaryPlus;
case UnaryMinus:
return ExpressionType.Negate;
}
throw new FormatException();
}

private static Precedence GetPrecedence(string op)
{
switch (op)
{
case ")":
return Precedence.RightParenthesis;
case "+":
case "-":
case "*":
case "/":
return Precedence.Multiplicative;
case "^":
return Precedence.Exponential;
case ImpliedProduct:
return Precedence.ImpliedProduct;
case SuperscriptPower:
return Precedence.SuperscriptPower;
}
return Precedence.Functional;
}

private Expression MakeBinary(string op, Expression lhs, Expression rhs) =>
Expression.MakeBinary(GetExpressionType(op), lhs, rhs);

private Expression MakeFunction(string f, Expression operand)
{
f = \$"{char.ToUpper(f)}{f.ToLower().Substring(1)}";
var result = Expressions.Function(f, operand);
if (operand is ConstantExpression c)
return result.AsDouble((double)c.Value).Constant();
return result;
}

private Expression MakeUnary(string op, Expression operand)
{
if (operand is ConstantExpression c)
switch (op)
{
case UnaryPlus: return operand;
case UnaryMinus: return (-(double)c.Value).Constant();
}
return Expression.MakeUnary(GetExpressionType(op), operand, null);
}

private string MatchFunction() => MatchRegex(@"^[\p{Lu}\p{Ll}\d]+").ToLower();

private string MatchNumber() => MatchRegex(@"^\d*\.?\d*([eE][+-]?\d+)?");

private string MatchRegex(string pattern)
{
var match = Regex.Match(Formula.Substring(Index), pattern);
return Formula.Substring(Index + match.Index, match.Length);
}

private string MatchSubscript() => MatchRegex(\$"^[{StringUtilities.Subscripts}]+");
private string MatchSuperscript() => MatchRegex(\$"^[{StringUtilities.Superscripts}]+");

private char NextChar()
{
var count = Formula.Length;
while (Index < count && Formula[Index] == ' ')
Index++;
return Index < count ? Formula[Index] : Index == count ? ')' : '\$';
}

private string NextToken()
{
var nextChar = NextChar();
switch (nextChar)
{
case '+':
case '-':
case '*':
case '/':
case '^':
case '(':
case ')':
case SquareRoot:
return nextChar.ToString();
case char c when char.IsDigit(c):
case '.':
return MatchNumber();
case char c when c.IsSuperscript():
return MatchSuperscript();
case char c when char.IsLetter(c):
return MatchFunction();
}
throw new FormatException(
\$"Unexpected character '{nextChar}', input='{Formula}', index={Index}");
}

private void ParseExpression()
{
do
{
ParseOperand();
var op = NextChar();
switch (op)
{
case '+':
case '-':
case '*':
case '/':
case '^':
case ')':
ParseOperator(op.ToString());
if (op == ')')
return;
break;
case '\$' when Index == Formula.Length + 2: // End of input (normal)
return;
case '\$' when Index < Formula.Length + 2: // End of input (unexpected)
throw new FormatException(
\$"Unexpected end of text, input='{Formula}', index={Index}");
case char c when c.IsSuperscript():
ParseOperator(SuperscriptPower); //
break;
default:
if (Operands.Peek() is ConstantExpression)
{
ParseOperator(ImpliedProduct); // Implied multiplication
break;
}
throw new FormatException(
\$"Unexpected character '{op}', input='{Formula}', index={Index}");
}
}
while (true);
}

private void ParseFunction(string function)
{
Operators.Push(function);
}

private bool ParseNamedConstant(string constant)
{
var value = GetNamedConstantValue(constant);
if (value == 0)
return false;
Operands.Push(value.Constant());
return true;
}

private void ParseNumber(string number)
{
try
{
Operands.Push(double.Parse(number).Constant());
}
catch (FormatException)
{
throw new FormatException(
\$"Invalid number format '{number}', input='{Formula}', index={Index}");
}
catch (OverflowException)
{
throw new FormatException(
\$"Numerical overflow '{number}', input='{Formula}', index={Index}");
}
}

private void ParseOperand()
{
var token = NextToken();
switch (char.ToLower(token))
{
case 'x' when token.Length == 1:
ParseParameter(token);
break;
case char c when char.IsDigit(c):
case '.':
ParseNumber(token);
break;
case char c when c.IsSuperscript():
ParseSuperscript(token);
break;
case '(':
Operators.Push(token);
ParseExpression();
break;
case '+':
case '-':
ParseUnary(token);
ParseOperand();
break;
case char c when char.IsLetter(c):
if (!ParseNamedConstant(token))
{
ParseFunction(token);
ParseOperand();
}
break;
case SquareRoot:
ParseSquareRoot();
ParseOperand();
break;
default:
throw new FormatException(
\$"Missing operand, input='{Formula}', index={Index}");
}
}

private void ParseOperator(string op)
{
do
{
var ours = GetPrecedence(op);
var pending = Operators.Peek();
if (pending == "(")
break;
var theirs = GetPrecedence(pending);
// Operator '^' is right associative: a^b^c = a^(b^c).
if (theirs > ours || theirs == ours && op != "^")
{
Operators.Pop();
var operand = Operands.Pop();
if (theirs == Precedence.Functional)
switch (pending)
{
case UnaryPlus:
break;
case UnaryMinus:
operand = MakeUnary(pending, operand);
break;
default:
try
{
operand = MakeFunction(pending, operand);
}
catch (ArgumentNullException)
{
throw new FormatException(
\$"Unrecognised function '{pending}', input='{Formula}'");
};
break;
}
else
{
if (pending == ImpliedProduct)
pending = "*";
operand = MakeBinary(pending, Operands.Pop(), operand);
}
Operands.Push(operand);
}
else
break;
}
while (true);
if (op == ")")
Operators.Pop();
else
Operators.Push(op);
if (op != ImpliedProduct && op != SuperscriptPower)
}

private void ParseParameter(string token)
{
Operands.Push(Expressions.x);
}

private void ParseSquareRoot()
{
Operators.Push("Sqrt");
}

private void ParseSuperscript(string superscript)
{
Operands.Push(new Parser().Parse(superscript.SuperscriptToNormal()));
}

private void ParseUnary(string unary)
{
Operators.Push(\$"u{unary}");
}

private void ReadPast(string token) => Index += token.Length;
}
}
``````

Going Wild!

It's a fair cop. If you've read through the source code above, you'll have spotted vestiges of not only implied multiplication, but also (implied) superscript exponentiation. Like a lot of subsequent developments, they're not visible in the grammar spec at the head of this article. But to start with, you can omit the multiplication sign after any constant number, and before the next operand, which may be "x", or a function, or "(" introducing a nested expression, or even another number (separated from the first by at least one space).

And what of the over-eager precedence problem? This new implicit multiplication operator needs its own, new precedence level, Implied, above Unary, so stuff like sin 2x works properly. But that's also higher than Exponential precedence, so 2x^3 becomes (2*x)^3, whereas in the arena of polynomials we'd rather see 2*(x^3). This can be fixed, luckily, quickly, and quirkily: introduce Unicode superscripts, and give them their own implicit exponentiation operator! This has the highest precedence of all, and causes 2x³ to get dressed as the wanted 2*(x^3).

Just how far can we go with superscripts? Unicode characters include numbers and common mathematical symbols ⁰¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾, a full superscript Latin lowercase alphabet ᵃᵇᶜᵈᵉᶠᵍʰⁱʲᵏˡᵐⁿᵒᵖʳˢᵗᵘᵛʷˣʸᶻ (well, full except for 'q'), a limited uppercase Latin alphabet ᴬᴮᴰᴱᴳᴴᴵᴶᴷᴸᴹᴺᴼᴾᴿᵀᵁⱽᵂ, and some Greek letters ᵅᵝᵞᵟᵋᶿᶥᶲᵠᵡ. These limitations already suggest we should use some other type of markup language - remember, this series started with a look at LaTeX - and the fact that even these available glyphs come from different ranges, so depending on your typeface can vary in size and position, surely cements that opinion. But until we make that switch, we can enjoy the novelty of chucking expressions like x⁴-4x³+6x²-4x+1, or eᶜᵒˢ⁽ˣ⁾, into our mathematical stockpot.

You don't have superscripts on your keyboard? Sorry, I'm just a code monkey. That's definitely a hardware problem.
``````
namespace FormulaBuilder
{
using System;
using System.Text;

public static class StringUtilities
{
public const string
Subscripts = "₀₁₂₃₄₅₆₇₈₉₊₋₌₍₎ₐₑₕᵢⱼₖₗₘₙₒₚᵣₛₜᵤᵥₓᵦᵧᵨᵩᵪ",
Transcripts = "0123456789+-=()aehijklmnoprstuvxβγρψχbcdfgwyzABDEGHIJKLMNOPRTUVWαδεθιφ",
Superscripts = "⁰¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ᵃᵉʰⁱʲᵏˡᵐⁿᵒᵖʳˢᵗᵘᵛˣᵝᵞρᵠᵡᵇᶜᵈᶠᵍʷʸᶻᴬᴮᴰᴱᴳᴴᴵᴶᴷᴸᴹᴺᴼᴾᴿᵀᵁⱽᵂᵅᵟᵋᶿᶥᶲ";

public static bool IsSubscript(this char c) => Subscripts.IndexOf(c) >= 0;
public static bool IsSuperscript(this char c) => Superscripts.IndexOf(c) >= 0;

public static string NormalToSubscript(this string number) => Transcribe(number, Transcripts, Subscripts);
public static string NormalToSuperscript(this string number) => Transcribe(number, Transcripts, Superscripts);
public static string SubscriptToNormal(this string number) => Transcribe(number, Subscripts, Transcripts);
public static string SubscriptToSuperscript(this string number) => Transcribe(number, Subscripts, Superscripts);
public static string SuperscriptToNormal(this string number) => Transcribe(number, Superscripts, Transcripts);
public static string SuperscriptToSubscript(this string number) => Transcribe(number, Superscripts, Subscripts);

public static string Transcribe(this string number, string source, string target)
{
var stringBuilder = new StringBuilder(number);
for (var index = 0; index < Math.Min(source.Length, target.Length); index++)
stringBuilder.Replace(source[index], target[index]);
return stringBuilder.ToString();
}
}
}
``````

But Does It Parse The Test?

Obviously we need some tests for all this, and in fact the following lines have just appeared in Expressions.Tests.cs. Despite giving the visual appearance of comparing one string with another one rather like it, these tests actually embody quite the round trip. The left string is pumped into the parser, generating an expression tree as its output. This is then fed through a stage 2 amplifier, namely the AsString() extension method, where we hope to see a similarly structured string emerge, but sporting at the most binary operations, and all the other minor effects and transformations expected.
``````
public static void TestParse(string input, string output) =>
Check(output, new Parser().Parse(input).AsString());

public static void TestParseFail(string input, string error)
{
try
{
new Parser().Parse(input);
Check("Exception thrown", "no Exception thrown");
}
catch (Exception ex)
{
Check(error, ex.Message);
}
}

public static void TestParser()
{
TestParserFailure();
TestParserSuccess();
}

public static void TestParserFailure()
{
TestParseFail("x~2", "Unexpected character '~', input='x~2', index=1");
TestParseFail("x+123,456", "Unexpected character ',', input='x+123,456', index=5");
TestParseFail("x+1\$2", "Unexpected end of text, input='x+1\$2', index=3");
TestParseFail("x+1e999", "Numerical overflow '1e999', input='x+1e999', index=2");
TestParseFail("x+.", "Invalid number format '.', input='x+.', index=2");
TestParseFail("x+.E+1", "Invalid number format '.E+1', input='x+.E+1', index=2");
TestParseFail("x+", "Missing operand, input='x+', index=2");
TestParseFail("(x+(2*(x+(3)))", "Unexpected end of text, input='(x+(2*(x+(3)))', index=15");
}

public static void TestParserSuccess()
{
TestParse("0", "0");
TestParse("e", "2.71828182845905");
TestParse("π", "3.14159265358979");
TestParse("pi", "3.14159265358979");
TestParse("ϕ", "1.61803398874989");
TestParse("phi", "1.61803398874989");
TestParse("X+1", "(x+1)");
TestParse("((x+1))", "(x+1)");
TestParse("x+x*x^x/x-x", "((x+((x*(x^x))/x))-x)");
TestParse("3*x+x/5", "((3*x)+(x/5))");
TestParse("x-2-x", "((x-2)-x)");                             // Subtraction is left associative
TestParse("x^2^x", "(x^(2^x))");                             // Exponentiation is right associative
TestParse("(x-3)*(5-x)/10", "(((x-3)*(5-x))/10)");
TestParse("sin x * cos x", "(Sin(x)*Cos(x))");
TestParse("Ln(sin x - tanh(x)) - 1", "(Ln((Sin(x)-Tanh(x)))-1)");
TestParse("Abs Cos Sin Tan 1.5", "0.540839774154307");
TestParse("Abs Cos Sin Tan (x/2)", "Abs(Cos(Sin(Tan((x/2)))))");
TestParse("2*(sin x + cos x ^ 3 - tan(x^3))/3", "((2*((Sin(x)+(Cos(x)^3))-Tan((x^3))))/3)");
TestParse("2*(x+3*(x-4^x)-5)/6", "((2*((x+(3*(x-(4^x))))-5))/6)");
TestParse("1/5x", "(1/(5*x))");                              // Implied products have higher precedence
TestParse("1/2sqrt(x)", "(1/(2*Sqrt(x)))");
TestParse("2 sin 3x", "(2*Sin((3*x)))");
TestParse("2 sin 3x * 5 cos 7x", "((2*Sin((3*x)))*(5*Cos((7*x))))");
TestParse("2 3", "(2*3)");
TestParse("2(x+3)", "(2*(x+3))");
TestParse("2x^3)", "((2*x)^3)");
TestParse("2(x^3)", "(2*(x^3))");
TestParse("x⁴-4x³+6x²-4x+1", "(((((x^4)-(4*(x^3)))+(6*(x^2)))-(4*x))+1)");
TestParse("1/2√(1-x²)", "(1/(2*Sqrt((1-(x^2)))))");
TestParse("eˣ", "(2.71828182845905^x)");
TestParse("eᶜᵒˢ⁽ˣ⁾", "(2.71828182845905^Cos(x))");
}
``````

## Wednesday, 17 April 2019

### Formula Translation: The Error Function Not At All Elementary, Watson

Previously:
Differentiating f(x)^g(x)
The Differentiator
Graphing Derivatives
The Complexities Of The Simplifier
Project Differentiator is growing rapidly (!) into a customizable calculus teaching tool. Such resources are sometimes more valuable than any generally available mathematics code library, utility, or even online suite like Wolfram Alpha or Mathematica. Just ask Grant Sanderson, whose magnificent YouTube channel 3blue1brown - in addition to its inestimable educational payload - delivers and promotes beautiful mathematics as an art form, rolling along atop its very own powerful, handcrafted, dynamic, mathematical graphics engine. When you want to explain things in great detailed graphics, and especially in animations, nothing beats running your own source code. Waiting for the dust to settle on the latest UI iteration, I thought I'd mention this short function here: a three-line translation, based on some old 1992 code found in a Fortran77 book, of the error function, y=erf(x), as defined in the integral equation alongside. That FORmula TRANslation provenance remains visible in my choice of variable names in the code below: well of course n is the integer loop variable.

Incidentally, you should never use the Fortran-recommended I as your loop variable. If a single pin breaks in your dot matrix printer, the resultant printout on your music ruled paper might be mistaken for the digit 1, crashing your lander on the moon.

The interesting thing about erf(x) is that it's non-elementary, making it actually one of the uninteresting ones, globally speaking.

``````
double t = 1 / (1 + Math.Abs(x) / 2), u = 1, v = 0;
for (var n = 0; n < 10; v += erf_a[n] * u, n++, u *= t) ;
return (1 - t * Math.Exp(v - x * x)) * Math.Sign(x);
``````

Yeah, absolutely nothing. Non-elementary functions might appear curious to engineers learning calculus, but the proportion of all possible continuous functions that are elementary is zero (assuming that's possible for the ratio of two uncountable infinities). An elementary function, as originally introduced by Joseph Liouville starting in 1833, is one built from certain basic functions (constant, algebraic, exponential and logarithmic, along with their inverses if they exist), by recursive composition and application of arithmetic operators, including root extraction.

On the other hand, despite its non-elementary nature, erf(x) is in great demand, mostly because it's literally the area under the normal distribution graph in statistics. And when you hear there's no way to integrate y=exp(x²), well, a constant multiple of erf(x) is the answer to that too. These reasons, together with its eminent differentiability (it's defined as an integral, after all), are why I wanted it in the Lego™ box.

Speaking of which, the list of elementary functions available in Sid, as the project is now dubbed, has more than doubled since you last looked. And if you ever needed an inverse trigonometric or hyperbolic function, but discovered it missing from the supplied System.Math set, you'll find the entire lot implemented here. Anyway, the erf code is the bottom third of this file:
``````
namespace FormulaBuilder
{
using System;

/// <summary>
/// Several math functions are either absent from System.Math (Asec, Sech, Asech,
/// Step) or unsuitable (Sign, which has an integer output), so they are provided
/// in this class. Might as well include the whole System.Math lot here too; then
/// we never need perform two reflection operations to find any function by name.
/// </summary>
public static class Functions
{
#region Elementary Functions

public static double Abs(double x) => Math.Abs(x);
public static double Acos(double x) => Math.Acos(x);
public static double Acosh(double x) => Math.Log(x + Math.Sqrt(Math.Pow(x, 2) - 1));
public static double Acot(double x) => Math.Atan(1 / x);
public static double Acoth(double x) => Math.Log((x + 1) / (x - 1)) / 2;
public static double Acsc(double x) => Math.Asin(1 / x);
public static double Acsch(double x) => Math.Log((1 + Math.Sqrt(1 + Math.Pow(x, 2))) / x);
public static double Asec(double x) => Math.Acos(1 / x);
public static double Asech(double x) => Math.Log((1 + Math.Sqrt(1 - Math.Pow(x, 2))) / x);
public static double Asin(double x) => Math.Asin(x);
public static double Asinh(double x) => Math.Log(x + Math.Sqrt(Math.Pow(x, 2) + 1));
public static double Atan(double x) => Math.Atan(x);
public static double Atanh(double x) => Math.Log((1 + x) / (1 - x)) / 2;
public static double Ceiling(double x) => Math.Ceiling(x);
public static double Cos(double x) => Math.Cos(x);
public static double Cosh(double x) => Math.Cosh(x);
public static double Cot(double x) => 1 / Math.Tan(x);
public static double Coth(double x) => 1 / Math.Tanh(x);
public static double Csc(double x) => 1 / Math.Sin(x);
public static double Csch(double x) => 1 / Math.Sinh(x);
public static double Exp(double x) => Math.Exp(x);
public static double Floor(double x) => Math.Floor(x);
public static double Ln(double x) => Math.Log(x);
public static double Log10(double x) => Math.Log10(x);
public static double Round(double x) => Math.Round(x);
public static double Sec(double x) => 1 / Math.Cos(x);
public static double Sech(double x) => 1 / Math.Cosh(x);
public static double Sign(double x) => Math.Sign(x);
public static double Sin(double x) => Math.Sin(x);
public static double Sinh(double x) => Math.Sinh(x);
public static double Sqrt(double x) => Math.Sqrt(x);
public static double Step(double x) => x < 0 ? 0 : 1;
public static double Tan(double x) => Math.Tan(x);
public static double Tanh(double x) => Math.Tanh(x);

#endregion

#region Non-elementary Functions

private static readonly double[] erf_a =
{
-1.26551223, +1.00002368, +0.37409196, +0.09678418, -0.18628806,
+0.27886807, -1.13520398, +1.48851587, -0.82215223, +0.17087277
};

/// <summary>
/// An approximation, with a worst case error of 1.2e-7, from the book
/// "Numerical Recipes in Fortran 77: The Art of Scientific Computing"
/// (ISBN 0-521-43064-X), 1992, page 214, Cambridge University Press.
/// </summary>
/// <param name="x">The input value to the function Erf(x).</param>
/// <returns>An approximation to the value of erf(x).</returns>
public static double Erf(double x)
{
double t = 1 / (1 + Math.Abs(x) / 2), u = 1, v = 0;
for (var n = 0; n < 10; v += erf_a[n] * u, n++, u *= t) ;
return (1 - t * Math.Exp(v - x * x)) * Math.Sign(x);
}

#endregion
}
}
``````

I'll confess to considering the elision of that redundant final u *= t in the for loop, before remembering the pipeline exception would certainly waste more picoseconds than a blast through the floating point silicon. Old optimisation rabbits die hard.

The algorithm first sets t=1/(1+|x|/2). Then in the for loop, it forms a ninth degree polynomial v(t), using ten supplied coefficients erf_a[0..9]. Finally, according to the original sign of x, your answer is either 1-t*e^(v(t)-x²), or its negative.

Next time: let's get that parser up and running...

Image credits: https://en.wikipedia.org/wiki/Error_function.

## Monday, 8 April 2019

### The Complexities Of The Simplifier

The Simplifier The Better

Previously:
Differentiating f(x)^g(x)
The Differentiator
Graphing Derivatives
Simplification of algebraic expressions was never the intended destination of this series, but it's where we've landed. The rules of differentiation guarantee an answer for any input function, subject to certain reasonable conditions of continuity. They just don't promise a pretty answer; that much is up to us! And generally, it requires more work.

Reduction of algebraic formulae is a big subject. Sometimes, to be fair, that's due to subjective constraints we place on answers. Take polynomials for example. It's natural to expect every power of x to be expressed as anxⁿ, rather than say (xⁿ)*an, albeit the algebra is indifferent to such distinctions. On the other hand, nobody ever ordered xⁿ+xⁿ when they actually wanted 2xⁿ.

Our Simplify() method is a fairly arbitrary piece of ad-hoc coding, which applies a specific set of transformation rules to the expression tree. These rules are motivated by the particular output from certain popular examples of differentiation, and are all applied during a single traversal of the tree. Here we will consider an alternative, where rules are applied one by one to the entire tree, starting at the root and proceeding depth-first recursively. These steps successively transform the tree such that new invariants become true upon each pass, resulting in an overall directed process of reduction.

Let's continue using our informal rule notation, where
• s, t represent any subtrees of arbitrary size and complexity;
• +, -, *, /, ^ are binary operators;
• u+, u- are the unary plus (identity) and minus (negate) operators respectively;
• c, d are numeric constants;
• 0, 1, -1 are the particular numeric constants 0, ±1.
Then using the right arrow '→' to indicate the application of a transformation rule to its left operand, resulting in the new tree in its right operand, we have the following examples. First, the unary u+ operator represents the identity transformation, and can just be dropped from wherever it occurs:
```u+(s) → s                            (Rule 1)
```
The unary u- operator is equivalent to the binary '-' with (0) on its left hand side. This is not strictly a reductive transformation, but it might be useful if we want to rid the tree of all unary operators, so that some other rule requiring this precondition can then be applied (note: the converse rule, which is reductive, was actually used in the original code):
```u-(s) → 0 - s                        (Rule 2)
```
One And One Is One

Any binary operator with constants on both sides (constant siblings) can be replaced by a single, computed constant value. Let @ be any one of the five binary operators. Then we have this rule template:
```c @ d → c@d                          (Rule 3)
```
where the notation c@d written without intervening spaces indicates a single constant node, comprising an immediate numerical calculation. Obviously checks against division by zero, and other indeterminate forms such as 0⁰, are assumed to raise the appropriate exceptions (DivideByZeroException, InvalidOperationException) when poked.

Similarly, certain constant niblings (nieces or nephews) can be combined, when their operators "match" - i.e., they are either both additive, or both multiplicative. For ease of legibility and comprehension in what follows, I'm going to let '+' represent the commutative operator in such a pair (which will actually be either '+' or '*'), and let '-' represent its noncommutative inverse (actually '-' or '/'). Then we have these four templates:
```(s + c) + d → s + c+d                (Rule 4a)
(s + c) - d → s + c-d                (Rule 4b)
(s - c) + d → s - c-d                (Rule 4c)
(s - c) - d → s - c+d                (Rule 4d)
```
Notice the pattern of operators in these templates. The first operator, following the s, is unchanged; while the second is set to either the commutative '+' if the original two operators were the same, or the noncommutative '-' if they were different. This logic was handled by local variables same & opps in the original SimplifyBinary() method.

Constant Craving

Everything up to this point, with the singular exception of the unary u- noted above, was included in the earlier single traversal code. However these four latest example templates assume all the constants are on the RHS of their parent binaries. That suggests another twelve such patterns exist, where either one or both constants are actually on the LHS:
```(c + s) + d → s + c+d                (Rule 5a)
(c + s) - d → s + c-d                (Rule 5b)
(c - s) + d → c+d - s                (Rule 5c)
(c - s) - d → c-d - s                (Rule 5d)

c + (s + d) → s + c+d                (Rule 6a)
c + (s - d) → s + c-d                (Rule 6b)
c - (s + d) → c+d - s                (Rule 6c)
c - (s - d) → c-d - s                (Rule 6d)

c + (d + s) → s + c+d                (Rule 7a)
c + (d - s) → c+d - s                (Rule 7b)
c - (d + s) → c-d - s                (Rule 7c)
c - (d - s) → s + c-d                (Rule 7d)
```
Almost half of these can be handled by preprocessing the tree to ensure that whenever possible, i.e. when a binary operator is commutative, its constant term appears on the RHS, using this supplementary rule:
```c + s → s + c                        (Rule 8)
```
A single application of Rule 8 carries Rules 5a and 6a into 4a, 5b into 4b, and 6b into 4c, while two successive applications carry 7a into 4a. That's five out of these new twelve cases dealt with. We are frustrated in claiming a full 50% by Rule 7d, which requires recognising that s is governed by two '-' signs (or '/' signs in the multiplicative version).

Constant Cousins

Another important transformation which could have been included in the original sample code, but was omitted for purposes of clarity, is the combination of cousin constants. This is where a binary operation has two binary children, each of which has a constant operand. Since we are not handling distributivity here, the three binaries involved must be either all additive, or all multiplicative. This yields another 8 templates which can be expressed, reusing the previous operator definitions, like this:
```(s + c) + (t + d) → (s + t) + c+d    (Rule 9a)
(s + c) + (t - d) → (s + t) + c-d    (Rule 9b)
(s + c) - (t + d) → (s - t) + c-d    (Rule 9c)
(s + c) - (t - d) → (s - t) + c+d    (Rule 9d)
(s - c) + (t + d) → (s + t) - c-d    (Rule 9e)
(s - c) + (t - d) → (s + t) - c+d    (Rule 9f)
(s - c) - (t + d) → (s - t) - c+d    (Rule 9g)
(s - c) - (t - d) → (s - t) - c-d    (Rule 9h)
```
Again there's a pattern to the operators, before and after. This time the first and second operators swap places, while the third is set either to the commutative '+' if the original set of three operators contained an even number (0 or 2) of '-'s, or to the noncommutative '-' if it contained an odd number (1 or 3). And just as before, the possibility of constants appearing in the LHS leads to another proliferation of additional cases (initially 24), around half of which are handled by our commutativity code, with the rest requiring further special handling.

Finally, Semantix Rears Its Monstrous Head

Merged constants aside, all of the foregoing transformations have been syntactical, in that they have not relied upon any particular constant values, function calls, powers of x, etc. Now we've combined as many constants into single values as we feel inclined to do, we can take advantage of a few simple semantic rules, such as those pertaining to additive and multiplicative identities and inverses. Assuming we're on the final stretch, we presumably no longer need maintain the unary-free invariant property of our expression tree, and can also reintroduce negative versions of the multiplicative rules. So, we may have 0 or ±1 on the RHS:
```s + 0 → s                            (Rule 10a)
s - 0 → s                            (Rule 10b)
s * 0 → 0                            (Rule 10c)
s * 1 → s                            (Rule 10d)
s * -1 → -s                          (Rule 10e)
s / 1 → s                            (Rule 10f)
s / -1 → -s                          (Rule 10g)
s ^ 0 → 1                            (Rule 10h)
s ^ 1 → s                            (Rule 10i)
```
Rules 10e & 10g are the negatives mentioned above. Finally, we may have 0 or 1 on the LHS, in these few cases which have not already been handled by commutativity:
```0 / s → 0                            (Rule 11a)
0 ^ s → 0                            (Rule 11b)
1 ^ s → 1                            (Rule 11c)
```
Next time: work continues on the above optimisation system, and a new UI is simultaneously being developed to allow user entry of formulae as plain text. While all that rages on, I'll report on several interesting design snippets just to keep the post count ticking over. The first of these will be the addition of the non-elementary error function Erf(x) to the library.

## Thursday, 4 April 2019

### Graphing Derivatives

A Series Of Points

Previously:
Differentiating f(x)^g(x)
The Differentiator
Note: The latest code can be found at https://github.com/dogbiscuituk/Sid/.
I've been inundated by a request from one reader, for a concrete example illustrating the plotting of graphs and their derivatives using my Differentiator class. This here quick and filthy Winforms application is my terse response. It uses no modern graphics library, just oldschool GDI+. Starting with a new Winforms app containing the FormulaBuilder class library from last time, add a new FormulaGrapher library to contain the two classes I'm about to describe.

Visual graphing controls normally provide a capability to plot multiple traces on the same canvas, so the first thing we need is a class to represent just one of these traces. Again following tradition, we'll call this the Series class. Its main responsibility will be to store the Expression used to calculate the points on its curve. This is passed into the constructor as the Formula parameter, and immediately lambda-compiled into the callable function Func.

For demo purposes, we'll also have this class store the resolution StepCount (the number of tiny line segments in the output, normally one less than the number of points computed); the PenColour used to draw the trace; and a cache containing the current plot limits, together with all of the computed point coordinates. After all, you might have any number of nested trigonometric functions to compute per point, and as long as the logical plot limits don't change, there's no need to recalculate these in response to e.g. resizing the output canvas in terms of physical pixels, or switching between screen and printer output.
``````
using System;
using System.Collections.Generic;
using System.Drawing;
using System.Linq.Expressions;
using FormulaBuilder;

namespace FormulaGrapher
{
public class Series
{
public Series(Expression formula, int stepCount)
{
Func = formula.AsFunction();
StepCount = stepCount;
}

public Color PenColour { get; set; }

public void Draw(Graphics g, RectangleF limits, float penWidth)
{
if (Limits != limits)
{
Limits = limits;
ComputePoints();
}
using (var pen = new Pen(PenColour, penWidth))
PointLists.ForEach(p => g.DrawLines(pen, p.ToArray()));
}

private Func<double, double> Func;
private int StepCount;
private RectangleF Limits;
private List<List<PointF>> PointLists = new List<List<PointF>>();

private void ComputePoints()
{
PointLists.Clear();
List<PointF> points = null;
float
x1 = Limits.Left, y1 = Limits.Top, y2 = Limits.Bottom,
w = Limits.Width, h = 8 * Limits.Height;
var skip = true;
for (var step = 0; step <= StepCount; step++)
{
float x = x1 + step * w / StepCount, y = (float)Func(x);
if (float.IsInfinity(y) || float.IsNaN(y) || y < y1 - h || y > y2 + h)
skip = true;
else
{
if (skip)
{
skip = false;
points = new List<PointF>();
}
}
}
// Every segment of the trace must include at least 2 points.
PointLists.RemoveAll(p => p.Count < 2);
}
}
}
``````

So our first class essentially consists of just this Draw method, which first checks whether the supplied logical limits are the same as the previous call, if any. When it decides recomputation is required, it iterates over x values between the supplied limits, splitting the interval into StepCount equal segments, and computing and storing the corresponding y values.

One interesting aspect of the structure used to store these points is its type: a List of Lists of PointFs. This particular contortion is needed because some graphs have discontinuities, or worse still, whole interior intervals on which they are entirely undefined. Hence the need to break the list into segments, whenever a y value is found to be infinite, or not a number, or just too damned big for our paltry coordinate system.

A Group Of Series

Next up, we have our supervisor class, which I've called Graph. This will take full control of its multiple individual Series, so we will only need to interact directly with the Graph class itself. Let's start by examining its constructor parameters.
• limits is a RectangleF structure specifying the logical window for the plots. Suppose we want the x domain to go from -9 to +9, and y to range from -7 to +7. The two minimum (negative) values can be supplied as-is, but the GDI+ convention requires width and height values to specify the window size, rather than the corresponding maximum (positive) values of x and y. So in our example, we would pass the window bounds as new RectangleF(-9, -7, 18, 14).
• stepCount has already been described; it is simply passed through to the Series objects for use when drawing.
As with the previous Series class, the interesting details are in the Draw method and its minions:
``````
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Drawing;
using System.Drawing.Drawing2D;
using System.Linq.Expressions;
using FormulaBuilder;

namespace FormulaGrapher
{
public class Graph
{
public Graph(RectangleF limits, int stepCount)
{
Limits = limits;
StepCount = stepCount;
}

{
Debug.WriteLine(formula.AsString());
var series = new Series(formula, StepCount);
return series;
}

public void Clear()
{
Series.Clear();
}

public void Draw(Graphics g, Rectangle r)
{
if (r.Width == 0 || r.Height == 0)
return; // Nothing to draw!
g.SmoothingMode = SmoothingMode.AntiAlias;
g.Transform = GetMatrix(r);
g.FillRectangle(Brushes.LightYellow, Limits);
var penWidth = (Limits.Width / r.Width + Limits.Height / r.Height);
DrawGrid(g, penWidth);
Series.ForEach(s => s.Draw(g, Limits, penWidth));
}

public PointF ScreenToGraph(Point p, Rectangle r)
{
var points = new[] { new PointF(p.X, p.Y) };
var matrix = GetMatrix(r);
matrix.Invert();
matrix.TransformPoints(points);
return points;
}

private RectangleF Limits;
private int StepCount;
private List<Series> Series = new List<Series>();

private void DrawGrid(Graphics g, float penWidth)
{
float x1 = Limits.Left, y1 = Limits.Bottom, x2 = Limits.Right, y2 = Limits.Top;
using (Pen gridPen = new Pen(Color.LightGray, penWidth) { DashStyle = DashStyle.Dot },
axisPen = new Pen(Color.DarkGray, penWidth))
using (var font = new Font("Arial", 5 * penWidth))
using (var format = new StringFormat(StringFormat.GenericTypographic) { Alignment = StringAlignment.Far })
{
var brush = Brushes.DarkGray;
for (int phase = 0; phase < 4; phase++)
{
var vertical = (phase & 1) != 0;
if (phase < 2)
{
var size = Math.Log10(Math.Abs(y2 - y1));
var step = Math.Floor(size);
size -= step;
step = (size < 0.3 ? 2 : size < 0.7 ? 5 : 10) * Math.Pow(10, step - 1);
for (var y = step; y <= Math.Max(Math.Abs(y1), Math.Abs(y2)); y += step)
{
DrawGridLine(g, gridPen, font, brush, x1, x2, (float)y, format, vertical);
DrawGridLine(g, gridPen, font, brush, x1, x2, -(float)y, format, vertical);
}
}
else
DrawGridLine(g, axisPen, font, brush, x1, x2, 0, format, vertical);
var t = x1; x1 = y1; y1 = t;
t = x2; x2 = y2; y2 = t;
}
}
}

private void DrawGridLine(Graphics g, Pen pen, Font font, Brush brush,
float x1, float x2, float y, StringFormat format, bool vertical)
{
float x = 0, y1 = y, y2 = y, z = -y;
if (vertical)
{
var t = x; x = y; y = t;
t = x1; x1 = y1; y1 = t;
t = x2; x2 = y2; y2 = t;
z = -z;
}
g.DrawLine(pen, x1, y1, x2, y2);
g.ScaleTransform(1, -1);
g.DrawString(z.ToString(), font, brush, x - pen.Width, y + pen.Width, format);
g.ScaleTransform(1, -1);
}

private Matrix GetMatrix(Rectangle r)
{
return new Matrix(Limits,
new[] { new PointF(r.Left, r.Bottom), new PointF(r.Right, r.Bottom), r.Location });
}
}
}
``````

The GDI+ Transform property of a Graphics context is a Matrix, which handles all conversions between our logical units, e.g. the 'one' that's the sine of half a pie (π/2), and physical units, e.g. the 'one' that's the size of a pixel on the display device. Here we basically set the Transform to equal the ratio between these coordinate systems, and forget it...

...until, that is, we come to specify a pen width for our plots. Here we can't just say 1, hoping for a single pixel width line. What we would get instead is a line as thick as the height of the sine graph we're trying to plot, making it look as if drawn with a giant crayon. Go ahead, try it, it's comical! The solution is just to scale the desired pixel pen width by the inverse of the coordinate system ratio, so undoing the effect of the Transform on our pen size.

DrawGrid() prepares the canvas, giving it the appearance of a musty old sheet of graph paper with x-y axes, and grid lines with numerical labels. Regardless of whether you scale your limits up or down by a factor of 100, each axis is divided into between 4 and 10 sensibly sized strips, by the magic of logarithms (initiates will quickly spot the usual four lines of code doing this essential work). The strips are then labelled in ones, twos or fives, as appropriate. A total of four phases are used to draw the x grid, y grid, x axis, and y axis in turn, ensuring that the grid is always behind the axes (and the axes behind the traces, which will be drawn next).

Note the inversions of the y axis near the end of DrawGridLine(). Since our graph's y values increase from bottom up, while device y values increase from top down, there is necessarily a flip in the transform used to map one to the other. If ignored, this flip would cause all text to be printed upside down! One of several simple solutions is to invert the y axis before printing text, and then again after, to restore the original coordinate mapping.

GetMatrix() has been factored out from the original Draw() method, to let external clients access logical coordinates via ScreenToGraph(). The main form below will use this feature to show on a ToolTip the updated logical position, i.e. graph world coordinates, of the moving mouse.
A Form Of Pencils

Finally it's time to add a main form to our Winforms app. Give it a context menu with an item for each formula you want to draw, following the pattern shown here. The code in the DrawGraphs() method will draw your selected function in black, then on the same canvas it will draw the first, second and third derivatives in red, green and blue respectively.
``````
using System.Drawing;
using System.Linq.Expressions;
using System.Windows.Forms;
using FormulaBuilder;
using FormulaGrapher;

namespace Sid
{
public partial class MainForm : Form
{
public MainForm()
{
InitializeComponent();
}

private void MainForm_Paint(object sender, PaintEventArgs e)
{
DrawGraph(e.Graphics);
}

private void MainForm_Resize(object sender, System.EventArgs e)
{
Invalidate();
}

private void MainForm_MouseMove(object sender, MouseEventArgs e)
{
var p = Graph.ScreenToGraph(e.Location, ClientRectangle);
ToolTip.SetToolTip(this, string.Format("({0}, {1})", p.X, p.Y));
}

private void MenuSinX_Click(object sender, System.EventArgs e)
{
DrawGraphs(sender, x.Sin());
}

private void MenuSinhXover2_Click(object sender, System.EventArgs e)
{
DrawGraphs(sender, x.Over(2).Sinh());
}

private void Menu1overX_Click(object sender, System.EventArgs e)
{
DrawGraphs(sender, x.Power(-1));
}

private void MenuLnXsquaredMinus1_Click(object sender, System.EventArgs e)
{
DrawGraphs(sender, x.Squared().Minus(1).Log());
}

private Graph Graph = new Graph(new RectangleF(-9, -7, 18, 14), 16000);
private Expression x = Differentiator.x;

private void DrawGraph(Graphics g)
{
Graph.Draw(g, ClientRectangle);
}

private void DrawGraphs(object Sender, Expression y)
{
Graph.Clear();
y = y.Differentiate();
y = y.Differentiate();
y = y.Differentiate();