**Modus Operandi**

*Previously:*

The Differentiator's fluent expression-building syntax is great when you have source code access, but it's easy to build a standard, recursive-descent parser for everyone else to enjoy. As you'll see below, the best thing about that is the sheer pleasure of creating your own precedence and associativity rules.Differentiating f(x)^g(x)

The Differentiator

Graphing Derivatives

The Complexities Of The Simplifier

Formula Translation: The Error Function

Our little language of arithmetic expressions in a single real variable

*x*has quite the simple grammar, which in mostly-EBNF, looks like this:

```
```**Parameter** = "x" ;
**Number** = ? \d*\.?\d*([eE][+-]?\d+)? ? ;
**NamedConstant** = "e" | "π" | "pi" | "ϕ" | "phi" ;
**UnaryOperator** = "+" | "-" | "!" | "~" | "√" ;
**BinaryOperator** = "||" | "&&" | "|" | "&" | "=" | "==" | "≠" | "<>" | "!=" | "<" | ">" | "≮ " | "≯ " | "<=" | ">="
| "+" | "-" | "*" | "/" | "^" ;
**Function** = "Abs" | "Acos" | "Acosh" | "Acot" | "Acoth" | "Acsc" | "Acsch" | "Asec" | "Asech" | "Asin"
| "Asinh" | "Atan" | "Atanh" | "Ceiling" | "Cos" | "Cosh" | "Cot" | "Coth" | "Csc" | "Csch"
| "Erf" | "Exp" | "Floor" | "Ln" | "Log10" | "Round" | "Sec" | "Sech" | "Sign" | "Sin"
| "Sinh" | "Sqrt" | "Step" | "Tan" | "Tanh" ;
**Operand** = **Parameter**
| **Number**
| **NamedConstant**
| **Operand**, "'" *(* Form the derivative *)*
| "(", **Expression**, ")"
| **Function**, **Operand**
| **UnaryOperator**, **Operand** ;
**Expression** = **Operand**, { **BinaryOperator**, **Operand** }
| **Expression**, "?", **Expression**, ":", **Expression** ;

The mutual recursion visible between the above

*Expression*and

*Operand*nonterminals is the hallmark of a classic infix-notation, parentheses-laden syntax.

**Notes on the Grammar**

- The Parameter 'x' is case insensitive, but will be converted to lower case during the parsing process.
- The
*Number*element is defined above using a regular expression pattern. This pattern overreaches, since for simplicity's sake, none of its character groups are mandatory. So the empty string is regarded as a valid number; as is, a single dot; or for example, the sequence '.E-'. But that's OK, as long as the tokeniser is just greedy enough, but doesn't steal characters from subsequent tokens. The actual parsing of constants will be done later by the*double.Parse()*method, so if the tokeniser passes through any such malformed number, the error will be detected. - A
*Number*starts with either a digit or a decimal point; any preceding '+' or '-' signs will be treated as unary operators (although the parser will later collapse all such operators into a single*Constant*node). - The
*NamedConstant*class does allow you to type the names of (some of) the few supported values as Greek letters (π, ϕ)... but you're wasting your time! This code collapses (combines) constants eagerly, and everything soon ends up as a double-precision, floating point number. NamedConstant elements are case insensitive. - The
*UnaryOperator*elements '!' (nothing to do with factorials) and '~' both represent a*logical not*operation. The number zero is regarded as*false*, anything else as*true*. So if*x=0*, then*!x*(or equivalently*~x*) will return*1*; otherwise*0*. - UnaryOperator '√' performs the same function as 'Sqrt'.
- The
*BinaryOperator*elements '&' and '&&' both perform a*logical And*operation, likewise '|' and '||' a*logical Or*. They differ only in their levels of precedence: as in C#, the order of evaluation is '&', '|', '&&', '||' (obviously you can just stick to one preferred style, as the same effect can be achieved using parentheses). Similarly, '=' and '==' perform the same equality test; '≠', '<>' and '!=' the same inequality test; and so on for the comparison operators. - The available
*Function*terminal elements are just the names of the 35 functions made available in the previous article. *Function*names are case insensitive, but will be converted to Title Case (initial capital) during the parsing process. Notice that the*Function*syntax doesn't require parentheses;*abs x*works just as well as*Abs(x)*.- The apostrophe "'" is the only postfix operator in the grammar. It has the effect of differentiating the preceding
*Operand*, so for example,*(Sin(x))'*evaluates to*Cos(x)*.

**Turning Japanese**

It's always at this point, turning from the syntax diagram to the semantics of coding, that I inevitably get seduced by this cool feature, first encountered in my first couple of pocket computers: the Sharp PC-1211 and PC-1500. The cool feature is

*implied products*. TL;DR: every memory category was in such short supply back then

*(see: millennium bug)*, even BASIC variables were meted out individually, and their names were

*A..Z*. The unexpected benefit of this rationing was that products of variables (variables which individually could only have single-character names like

*A*or

*B*) could now be expressed unambiguously in the form

*AB*, saving a whole byte by omitting the multiplication symbol.

At first glimpse this implied multiplication seems no big deal, but wait. Does it necessarily have the same associativity and precedence as normal multiplication? If so, then

But this seems wrong! The expression looks so much more like a quantityA/BCwould be parsed as(A/B)*C = (A*C)/B.

*A*being divided by a quantity

*BC*, in other words,

Now theA/BC=A/(B*C).

*C*has landed below the line; before, it was above. Which is correct?

Utility is monarch, and the solution turns out to be the introduction of a new kind of

*implied multiplication*with a higher precedence than the usual one. By analogy: the associativity of '^' was once settled to be

*right*, unlike its additive and multiplicative colleagues which all tended

*left*.

So, while subtraction goes like this:because the real action's in the exponent. In the case of the implicit multiplier, the new operator was given a higher precedence than its peers, so for exampleA-B-C = (A-B)-C = A-(B+C),

by contrast exponentiation does this:A^B^C = A^(B^C)

rather than the much less useful(A^B)^C,

In this way multiplication, if it just remain hidden, can leapfrog other more powerful operators, perhaps even unary operators, perhaps even function calls:A^BC = A^(B*C), rather than(A^B)*C.

sin AB = sin (A * B), compared tosin A * B = (sin A) * B.

**White Feather**

So why did I hesitate to implement such an operator in

*Parser*?

A recurring subject in the C# language designers' blogs and comment responses is

*cost*. Nothing comes for free! There's always at least a test budget to consider, and almost always, myriad further hidden design and development costs. Implied products are almost

*too*attractive a feature. Sure,

*(x+2)(x-2)*and its ilk are compelling, but to get the full story, you must go back to the language's grammar definition and look at all newly emerging cases.

What about

for example. The familiar2 sin x cos x

*sine of a double angle*formula, obviously and unambiguously intended to be read as

But should the precedence of "implicit multiply" exceed even unary and function calls, the interpretation becomes2*sin(x)*cos(x).

which, simply, ouch.2*sin(x*cos(x)),

Then again, why all this detail, if I never intended to implement such an operator?

*Because I knew I would*. I knew that about myself, and so wanted all this background material to be available for review, when I finally revisited the code to smoosh it up a bit. Also, the very inevitability of that revisit is why I made this code as

*clean*and

*readily understandable*as I could. Part of this process is of course eliminating all but the most vital of comments from the source; hence their home in this article.

That

*Precedence*enumeration was a proper godsend when smooshing time arrived! Anyway, here comes the parser code:

```
namespace FormulaBuilder
{
using System;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
using System.Text.RegularExpressions;
public class Parser
{
private enum Precedence
{
RightParenthesis,
Additive,
Multiplicative,
Exponential,
Functional,
ImpliedProduct,
SuperscriptPower
}
const char
SquareRoot = '√';
const string
ImpliedProduct = "i*",
SuperscriptPower = "s^",
UnaryMinus = "u-",
UnaryPlus = "u+";
private string Formula;
private int Index;
private Stack<Expression> Operands;
private Stack<string> Operators;
public Expression Parse(string formula)
{
Formula = formula;
Index = 0;
Operands = new Stack<Expression>();
Operators = new Stack<string>(new[] { "(" });
ParseExpression();
if (Operators.Any())
throw new FormatException(
$"Unexpected end of expression, input='{Formula}'");
return Operands.Peek();
}
public bool TryParse(string formula, out object result)
{
try
{
result = Parse(formula);
return true;
}
catch (Exception e)
{
result = e.Message;
return false;
}
}
private static double GetNamedConstantValue(string constant)
{
switch (constant.ToLower())
{
case "e":
return Math.E;
case "π":
case "pi":
return Math.PI;
case "ϕ":
case "phi":
return (1 + Math.Sqrt(5)) / 2;
}
return 0;
}
private static ExpressionType GetExpressionType(string op)
{
switch (op)
{
case "+":
return ExpressionType.Add;
case "-":
return ExpressionType.Subtract;
case "*":
case "i*":
return ExpressionType.Multiply;
case "/":
return ExpressionType.Divide;
case "^":
case "s^":
return ExpressionType.Power;
case UnaryPlus:
return ExpressionType.UnaryPlus;
case UnaryMinus:
return ExpressionType.Negate;
}
throw new FormatException();
}
private static Precedence GetPrecedence(string op)
{
switch (op)
{
case ")":
return Precedence.RightParenthesis;
case "+":
case "-":
return Precedence.Additive;
case "*":
case "/":
return Precedence.Multiplicative;
case "^":
return Precedence.Exponential;
case ImpliedProduct:
return Precedence.ImpliedProduct;
case SuperscriptPower:
return Precedence.SuperscriptPower;
}
return Precedence.Functional;
}
private Expression MakeBinary(string op, Expression lhs, Expression rhs) =>
Expression.MakeBinary(GetExpressionType(op), lhs, rhs);
private Expression MakeFunction(string f, Expression operand)
{
f = $"{char.ToUpper(f[0])}{f.ToLower().Substring(1)}";
var result = Expressions.Function(f, operand);
if (operand is ConstantExpression c)
return result.AsDouble((double)c.Value).Constant();
return result;
}
private Expression MakeUnary(string op, Expression operand)
{
if (operand is ConstantExpression c)
switch (op)
{
case UnaryPlus: return operand;
case UnaryMinus: return (-(double)c.Value).Constant();
}
return Expression.MakeUnary(GetExpressionType(op), operand, null);
}
private string MatchFunction() => MatchRegex(@"^[\p{Lu}\p{Ll}\d]+").ToLower();
private string MatchNumber() => MatchRegex(@"^\d*\.?\d*([eE][+-]?\d+)?");
private string MatchRegex(string pattern)
{
var match = Regex.Match(Formula.Substring(Index), pattern);
return Formula.Substring(Index + match.Index, match.Length);
}
private string MatchSubscript() => MatchRegex($"^[{StringUtilities.Subscripts}]+");
private string MatchSuperscript() => MatchRegex($"^[{StringUtilities.Superscripts}]+");
private char NextChar()
{
var count = Formula.Length;
while (Index < count && Formula[Index] == ' ')
Index++;
return Index < count ? Formula[Index] : Index == count ? ')' : '$';
}
private string NextToken()
{
var nextChar = NextChar();
switch (nextChar)
{
case '+':
case '-':
case '*':
case '/':
case '^':
case '(':
case ')':
case SquareRoot:
return nextChar.ToString();
case char c when char.IsDigit(c):
case '.':
return MatchNumber();
case char c when c.IsSuperscript():
return MatchSuperscript();
case char c when char.IsLetter(c):
return MatchFunction();
}
throw new FormatException(
$"Unexpected character '{nextChar}', input='{Formula}', index={Index}");
}
private void ParseExpression()
{
do
{
ParseOperand();
var op = NextChar();
switch (op)
{
case '+':
case '-':
case '*':
case '/':
case '^':
case ')':
ParseOperator(op.ToString());
if (op == ')')
return;
break;
case '$' when Index == Formula.Length + 2: // End of input (normal)
return;
case '$' when Index < Formula.Length + 2: // End of input (unexpected)
throw new FormatException(
$"Unexpected end of text, input='{Formula}', index={Index}");
case char c when c.IsSuperscript():
ParseOperator(SuperscriptPower); //
break;
default:
if (Operands.Peek() is ConstantExpression)
{
ParseOperator(ImpliedProduct); // Implied multiplication
break;
}
throw new FormatException(
$"Unexpected character '{op}', input='{Formula}', index={Index}");
}
}
while (true);
}
private void ParseFunction(string function)
{
Operators.Push(function);
ReadPast(function);
}
private bool ParseNamedConstant(string constant)
{
var value = GetNamedConstantValue(constant);
if (value == 0)
return false;
Operands.Push(value.Constant());
ReadPast(constant);
return true;
}
private void ParseNumber(string number)
{
try
{
Operands.Push(double.Parse(number).Constant());
}
catch (FormatException)
{
throw new FormatException(
$"Invalid number format '{number}', input='{Formula}', index={Index}");
}
catch (OverflowException)
{
throw new FormatException(
$"Numerical overflow '{number}', input='{Formula}', index={Index}");
}
ReadPast(number);
}
private void ParseOperand()
{
var token = NextToken();
switch (char.ToLower(token[0]))
{
case 'x' when token.Length == 1:
ParseParameter(token);
break;
case char c when char.IsDigit(c):
case '.':
ParseNumber(token);
break;
case char c when c.IsSuperscript():
ParseSuperscript(token);
break;
case '(':
Operators.Push(token);
ReadPast(token);
ParseExpression();
break;
case '+':
case '-':
ParseUnary(token);
ParseOperand();
break;
case char c when char.IsLetter(c):
if (!ParseNamedConstant(token))
{
ParseFunction(token);
ParseOperand();
}
break;
case SquareRoot:
ParseSquareRoot();
ParseOperand();
break;
default:
throw new FormatException(
$"Missing operand, input='{Formula}', index={Index}");
}
}
private void ParseOperator(string op)
{
do
{
var ours = GetPrecedence(op);
var pending = Operators.Peek();
if (pending == "(")
break;
var theirs = GetPrecedence(pending);
// Operator '^' is right associative: a^b^c = a^(b^c).
if (theirs > ours || theirs == ours && op != "^")
{
Operators.Pop();
var operand = Operands.Pop();
if (theirs == Precedence.Functional)
switch (pending)
{
case UnaryPlus:
break;
case UnaryMinus:
operand = MakeUnary(pending, operand);
break;
default:
try
{
operand = MakeFunction(pending, operand);
}
catch (ArgumentNullException)
{
throw new FormatException(
$"Unrecognised function '{pending}', input='{Formula}'");
};
break;
}
else
{
if (pending == ImpliedProduct)
pending = "*";
operand = MakeBinary(pending, Operands.Pop(), operand);
}
Operands.Push(operand);
}
else
break;
}
while (true);
if (op == ")")
Operators.Pop();
else
Operators.Push(op);
if (op != ImpliedProduct && op != SuperscriptPower)
ReadPast(op);
}
private void ParseParameter(string token)
{
Operands.Push(Expressions.x);
ReadPast(token);
}
private void ParseSquareRoot()
{
Operators.Push("Sqrt");
ReadPast(SquareRoot.ToString());
}
private void ParseSuperscript(string superscript)
{
Operands.Push(new Parser().Parse(superscript.SuperscriptToNormal()));
ReadPast(superscript);
}
private void ParseUnary(string unary)
{
Operators.Push($"u{unary}");
ReadPast(unary);
}
private void ReadPast(string token) => Index += token.Length;
}
}
```

**Going Wild!**

It's a fair cop. If you've read through the source code above, you'll have spotted vestiges of not only implied multiplication, but also (implied) superscript exponentiation. Like a lot of subsequent developments, they're not visible in the grammar spec at the head of this article. But to start with, you can omit the multiplication sign after any constant number, and before the next operand, which may be

*"x"*, or a function, or "(" introducing a nested expression, or even another number (separated from the first by at least one space).

And what of the over-eager precedence problem? This new

*implicit multiplication*operator needs its own, new precedence level,

*Implied*, above

*Unary*, so stuff like

*sin 2x*works properly. But that's also higher than

*Exponential*precedence, so

*2x^3*becomes

*(2*x)^3*, whereas in the arena of polynomials we'd rather see

*2*(x^3)*. This can be fixed, luckily, quickly, and quirkily: introduce Unicode superscripts, and give them their own

*implicit exponentiation*operator! This has the highest precedence of all, and causes

*2x³*to get dressed as the wanted

*2*(x^3)*.

Just how far can we go with superscripts? Unicode characters include numbers and common mathematical symbols ⁰¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾, a full superscript Latin lowercase alphabet ᵃᵇᶜᵈᵉᶠᵍʰⁱʲᵏˡᵐⁿᵒᵖʳˢᵗᵘᵛʷˣʸᶻ (well, full except for

*'q'*), a limited uppercase Latin alphabet ᴬᴮᴰᴱᴳᴴᴵᴶᴷᴸᴹᴺᴼᴾᴿᵀᵁⱽᵂ, and some Greek letters ᵅᵝᵞᵟᵋᶿᶥᶲᵠᵡ. These limitations already suggest we should use some other type of markup language - remember, this series started with a look at LaTeX - and the fact that even these available glyphs come from different ranges, so depending on your typeface can vary in size and position, surely cements that opinion. But until we make that switch, we can enjoy the novelty of chucking expressions like

*x⁴-4x³+6x²-4x+1*, or

*eᶜᵒˢ⁽ˣ⁾*, into our mathematical stockpot.

You don't have superscripts on your keyboard? Sorry, I'm just a code monkey. That's definitely a hardware problem.

```
namespace FormulaBuilder
{
using System;
using System.Text;
public static class StringUtilities
{
public const string
Subscripts = "₀₁₂₃₄₅₆₇₈₉₊₋₌₍₎ₐₑₕᵢⱼₖₗₘₙₒₚᵣₛₜᵤᵥₓᵦᵧᵨᵩᵪ",
Transcripts = "0123456789+-=()aehijklmnoprstuvxβγρψχbcdfgwyzABDEGHIJKLMNOPRTUVWαδεθιφ",
Superscripts = "⁰¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ᵃᵉʰⁱʲᵏˡᵐⁿᵒᵖʳˢᵗᵘᵛˣᵝᵞρᵠᵡᵇᶜᵈᶠᵍʷʸᶻᴬᴮᴰᴱᴳᴴᴵᴶᴷᴸᴹᴺᴼᴾᴿᵀᵁⱽᵂᵅᵟᵋᶿᶥᶲ";
public static bool IsSubscript(this char c) => Subscripts.IndexOf(c) >= 0;
public static bool IsSuperscript(this char c) => Superscripts.IndexOf(c) >= 0;
public static string NormalToSubscript(this string number) => Transcribe(number, Transcripts, Subscripts);
public static string NormalToSuperscript(this string number) => Transcribe(number, Transcripts, Superscripts);
public static string SubscriptToNormal(this string number) => Transcribe(number, Subscripts, Transcripts);
public static string SubscriptToSuperscript(this string number) => Transcribe(number, Subscripts, Superscripts);
public static string SuperscriptToNormal(this string number) => Transcribe(number, Superscripts, Transcripts);
public static string SuperscriptToSubscript(this string number) => Transcribe(number, Superscripts, Subscripts);
public static string Transcribe(this string number, string source, string target)
{
var stringBuilder = new StringBuilder(number);
for (var index = 0; index < Math.Min(source.Length, target.Length); index++)
stringBuilder.Replace(source[index], target[index]);
return stringBuilder.ToString();
}
}
}
```

**But Does It Parse The Test?**

Obviously we need some tests for all this, and in fact the following lines have just appeared in

*Expressions.Tests.cs*. Despite giving the visual appearance of comparing one string with another one rather like it, these tests actually embody quite the round trip. The left string is pumped into the parser, generating an expression tree as its output. This is then fed through a stage 2 amplifier, namely the

*AsString()*extension method, where we hope to see a similarly structured string emerge, but sporting at the most

*binary*operations, and all the other minor effects and transformations expected.

```
public static void TestParse(string input, string output) =>
Check(output, new Parser().Parse(input).AsString());
public static void TestParseFail(string input, string error)
{
try
{
new Parser().Parse(input);
Check("Exception thrown", "no Exception thrown");
}
catch (Exception ex)
{
Check(error, ex.Message);
}
}
public static void TestParser()
{
TestParserFailure();
TestParserSuccess();
}
public static void TestParserFailure()
{
TestParseFail("x~2", "Unexpected character '~', input='x~2', index=1");
TestParseFail("x+123,456", "Unexpected character ',', input='x+123,456', index=5");
TestParseFail("x+1$2", "Unexpected end of text, input='x+1$2', index=3");
TestParseFail("x+1e999", "Numerical overflow '1e999', input='x+1e999', index=2");
TestParseFail("x+.", "Invalid number format '.', input='x+.', index=2");
TestParseFail("x+.E+1", "Invalid number format '.E+1', input='x+.E+1', index=2");
TestParseFail("x+", "Missing operand, input='x+', index=2");
TestParseFail("(x+(2*(x+(3)))", "Unexpected end of text, input='(x+(2*(x+(3)))', index=15");
}
public static void TestParserSuccess()
{
TestParse("0", "0");
TestParse("e", "2.71828182845905");
TestParse("π", "3.14159265358979");
TestParse("pi", "3.14159265358979");
TestParse("ϕ", "1.61803398874989");
TestParse("phi", "1.61803398874989");
TestParse("X+1", "(x+1)");
TestParse("((x+1))", "(x+1)");
TestParse("x+x*x^x/x-x", "((x+((x*(x^x))/x))-x)");
TestParse("3*x+x/5", "((3*x)+(x/5))");
TestParse("x-2-x", "((x-2)-x)"); // Subtraction is left associative
TestParse("x^2^x", "(x^(2^x))"); // Exponentiation is right associative
TestParse("(x-3)*(5-x)/10", "(((x-3)*(5-x))/10)");
TestParse("sin x * cos x", "(Sin(x)*Cos(x))");
TestParse("Ln(sin x - tanh(x)) - 1", "(Ln((Sin(x)-Tanh(x)))-1)");
TestParse("Abs Cos Sin Tan 1.5", "0.540839774154307");
TestParse("Abs Cos Sin Tan (x/2)", "Abs(Cos(Sin(Tan((x/2)))))");
TestParse("2*(sin x + cos x ^ 3 - tan(x^3))/3", "((2*((Sin(x)+(Cos(x)^3))-Tan((x^3))))/3)");
TestParse("2*(x+3*(x-4^x)-5)/6", "((2*((x+(3*(x-(4^x))))-5))/6)");
TestParse("1/5x", "(1/(5*x))"); // Implied products have higher precedence
TestParse("1/2sqrt(x)", "(1/(2*Sqrt(x)))");
TestParse("2 sin 3x", "(2*Sin((3*x)))");
TestParse("2 sin 3x * 5 cos 7x", "((2*Sin((3*x)))*(5*Cos((7*x))))");
TestParse("2 3", "(2*3)");
TestParse("2(x+3)", "(2*(x+3))");
TestParse("2x^3)", "((2*x)^3)");
TestParse("2(x^3)", "(2*(x^3))");
TestParse("x⁴-4x³+6x²-4x+1", "(((((x^4)-(4*(x^3)))+(6*(x^2)))-(4*x))+1)");
TestParse("1/2√(1-x²)", "(1/(2*Sqrt((1-(x^2)))))");
TestParse("eˣ", "(2.71828182845905^x)");
TestParse("eᶜᵒˢ⁽ˣ⁾", "(2.71828182845905^Cos(x))");
}
```

## No comments:

## Post a Comment