Monday, 11 May 2015

Wizards and Warriors

What Could Possibly Go Wrong?
Players, such as wizards and warriors, have weapons; such as staffs and swords.
So innocuously begins Eric Lippert's latest forage into object-oriented design. Like all of his best work, this adventure soon takes us into various intuitive dead ends and surprising ambush back-alleys. You will nod sagely as he overcomes each successive objection, excises each fly from the ointment, with a new design principle or approach; THEN CURSE as your newest paradigm rears up and bites you in the assets. Finally, you will marvel at the astonishing and enlightened conclusion of it all, by which time you will have become a better human being in every sense that matters (to an object-oriented software designer).

Enjoy it here:
Part One
Part Two
Part Three
Part Four
Part Five
Note: the "DSLs" that emerge toward the end of Part Five are of course Domain Specific Languages.

Tuesday, 10 March 2015

The Uroboros and Other Quines

The 50-Language Uroboros
Eleven, Fifty, One Hundred Languages...

Something over five years ago, I described Yusuke Endoh's cycle of eleven programs - each of which, "written" in a particular computer programming language, prints out the source code - in a completely different language - for the next program in the cycle. Here, I have put the verb "written" into "so-called quotes", because in fact every program in such a cycle is auto-generated by the preceding one. So in fact, nothing at all need be "written", in the sense of "consciously composed" - half a dozen decent monkeys banging on a few old typewriters for a fortnight ought to suffice!

Just kidding, of course. Yusuke-san's achievement was and remains a very clever and unspeakably elegant construction. That its method of discovery can so readily be explained, and thence extended and generalised in various ways, detracts in no part from this.

The 100-Language Uroboros
Cyclic Cod Cosmologies

At the time, this extension of the Quine concept made me muse on the subject of Sims. The notion that our universe is "almost certainly" a simulation was trending highly at the time, and this cycle of languages seemed to lend the air of a Kekulé, benzene-ring "snake anecdote" to the previously hierarchical model of Sims creating Sims, creating Sims... and turtles all the way down.

For what if the laws of thermodynamics, working their random magic on the primordial "box of smoke" bestowed on us by an unstable vacuum, at some particular moment allowed the appearance of a configuration matching one of these Quines? Why then, wouldn't the subsequent evolution of the system be driven deterministically through the remaining phases, and eventually, back to that same starting configuration? In other words, the prime mover may be the ultimate effect, and we may well be products of creations we have authored.

Obviously this cosmology glosses over the fact that the laws of physics do not, in fact, appear to comprise a particular sequence of discrete, high-level computer language compilers and interpreters. Pah! Don't distract me with such trivial details!

Subsequent Developments

The original sequence of eleven languages went like this: Ruby, Python, Perl, Lua, OCaml, Haskell, C, Java, Brainfuck, Whitespace, Unlambda (and then back to Ruby). Then I missed a landmark in the development of this program cycle, because in 2012, Yusuke-san produced a 50-language variant demonstrating his complete mastery of the technique. Not so much by the sheer number of languages, as by the arbitrary character of his choice of "50" as a temporary stopping point, and the fact that all 50 languages are sequenced in alphanumeric order of language name!

Now there's a 100-language version, announced to the world on December 23rd, 2014 with the comment, "the 100th language: Lazy K. Merry Quine-mas!" This new dragon stretches all the way from A+ to Zoem and back, taking in all the Cs, sundry Lisps, Makefile, Visual Basic and LOLCODE among the many others along the way.

When you remember exactly what these programs do, the functionality disclaimer at the foot of the GitHub page is its own peculiar brand of comedy gold.

In Other News

Yusuke-san (@mametter) is also responsible - in collaboration with his wife Hiren (@hirekoke), whom he credits with teaching him Gingold, Monaghan and Lucy's SPH (smoothed-particle hydrodynamics) methods - for this almost-viral ASCII 2-D fluid mechanics simulator, currently doing the rounds:

Friday, 20 February 2015

Microsoft Achieves Privacy Standard

Microsoft has achieved the globally recognized ISO/IEC 27018 privacy standard for Azure, Office 365, and Dynamics CRM Online. And here is the BSI certificate to prove it! These services are now aligned with the standard’s code of practice for the protection of Personally Identifiable Information (PII) in the public cloud.

Further Information

Announcement last Monday by Microsoft's General Counsel & Executive Vice President, Legal and Corporate Affairs, Brad Smith:
"With the Microsoft Cloud, you’re in control", he concludes, and it's hard to deny that's a step in a good direction.

Thursday, 17 July 2014

Female Symphonists

While Wikipedia currently lists more than 900 entries in the category Women classical composers, a diligent search reveals that comparatively few of these have written a symphony - less than one in five, according to my very unscientific sampling. And while considering them as a group might make exactly as much sense as clumping together all male symphonists, to wit no sense at all, still it might advocate just a little for positive discrimination. Here then are seven of the best.

Louise Farrenc
Louise Farrenc (1804 - 1875, France)

In 19th century France, opera was king. The public turned away from anything else, indeed from any unfamiliar or new name. Symphonically speaking, they tended to shun anything that was not Beethoven, Haydn, Mendelssohn, Mozart, Schumann, or otherwise German and Great.

Furthermore, the resources necessary to mount a symphonic performance were beyond almost anyone's means, as lamented by Camille Saint-Saëns and others. If the Société des Concerts du Conservatoire didn't chose your work for performance, there were no jobbing orchestras for hire; you had to pay for and assemble your own band personally.

It was against this backdrop that Louise Farrenc sought to buck the trend for exclusively male composers, not to mention symphonists. And with a great measure of success; admired by Berlioz and Schumann, this teacher and scholar composed three superb symphonies.

Farrenc's other accomplishments included the co-founding, with her husband Aristide, of the publishing house Éditions Farrenc in Paris, which remained one of France’s leading music publishers for nearly 40 years. She spent 30 years as Professor of the Piano at the Paris Conservatory, where her excellent instruction led to many of her students graduating with Premier Prix.

On two occasions, in 1861 and then again in 1869, she received the Prix Chartier of the Académie des Beaux-Arts.

Selected works for full orchestra:

Emilie Mayer
Emilie Mayer (1812 - 1883, Germany)

The encouragement of a mentor proved invaluable to this 19th century romantic composer. The conductor, baritone, and fellow symphonist Carl Loewe, known in his day as "the Schubert of North Germany", once said of Mayer: "Such a God-given talent as hers had not been bestowed upon any other person he knew." Such accolade gave her much inspiration and motivation throughout the rest of her very prolific composing career.

Mayer wrote a documented total of eight symphonies during the decade 1847-1857, in addition to numerous chamber works for strings and/or piano... not to mention an opera (Die Fischerin - The Fisherwoman), and a piano concerto! Her works garnered much critical and popular acclaim, and she would travel Europe in the 1850s to attend public performances of her works.

Selected works for full orchestra:

Amy Beach
Amy Beach (1867 - 1944, America)

Amy Beach was a child prodigy who could sing and harmonise accurately by age two. At five, she was composing waltzes. At six she began piano lessons, and by age seven, was giving public recitals of Beethoven, Chopin, Handel, and her own work. As a composer, Amy was almost entirely self-taught, with the exception of a year spent at age fourteen, learning harmony and counterpoint from Junius W. Hill. She made her professional debut in Boston in 1883, and soon after became a soloist with the Boston Symphony Orchestra.

Beach's early writing is mainly Romantic, often compared to Brahms or Rachmaninoff. Later she would move away from tonality and into whole tone scales, using more exotic harmonies and techniques.

The Boston Pops paid tribute to Beach in 2000, when her name was added to the granite wall on Boston's famous Hatch Shell - the only woman ever to have received this honour.

Beach wrote many songs for solo voice with piano accompaniment, together with many more of the sacred and secular choral kinds. Her works also include much piano and chamber music, a mass, and like Emilie Mayer above, a piano concerto and an opera (Cabildo). Today she is probably best remembered for her singular symphonic composition, the celebrated ''Gaelic'' Symphony, written in response to Dvorak's criticism of American composers.

Selected works for full orchestra:

Ruth Gipps
Ruth Gipps (1921 - 1999, England)

Another child prodigy, Ruth Gipps performed her first composition at age 8 in a music festival, when the work was bought by a publishing house. Soon afterwards she began her performance career in earnest by winning a concerto competition with the Hastings Municipal Orchestra.

In 1936 she entered the Royal College of Music to study theory, composition, piano, and eventually oboe. An accomplished all-round oboe and piano soloist, she was also a prolific composer. A hand injury at age 33 ended her performance career; she decided to focus on conducting and composition.

Her music often shows the influence of her teacher Vaughan Williams. She rejected serialism, twelve-tone music, and other such trends in the avant-garde, considering her five symphonies as her greatest works.

She founded the London Repertoire Orchestra in 1955 as an opportunity for young professional musicians to become exposed to a wide range of music, and the Chanticleer Orchestra in 1961, a professional ensemble which included a work by a living composer in each of its programs, often a premiere performance. Later she would take faculty posts at Trinity College, London (1959 to 1966) and the Royal College of Music (1967 to 1977), and then the Kingston Polytechnic.

Selected works for full orchestra:

Gloria Coates
Gloria Coates (b. 1938, Wisconsin USA)

American by birth and education, Gloria Coates has lived in Munich, Germany since 1969. Having already written some half-dozen extended, multi-movement orchestral works, she decided while writing the seventh ("Dedicated to those who brought down the Wall in PEACE") in 1990, to revisit them all and relabel them as symphonies.

In so doing, she set herself on course some fourteen years and eight more symphonies later, to become known as the most prolific female symphonist of all time.

To a certain extent this distinction is arbitrary. As Kyle Gann writes in his 1999 New York Times article A Symphonist Stakes Her Claim, "Symphony", after all, is a word open to wide interpretation. It does not, for Ms. Coates, refer to a work in several movements, the outer ones allegro and the second one adagio. He also reports Ms Coates as saying, "It has to do with the intensity of what I'm trying to say and the fact that it took 48 different instrumental lines to say it, and that the structures I was using had evolved over many years. I couldn't call it a little name."

Selected works for full orchestra:

Ellen Taaffe Zwilich
Photo ©
Ellen Taaffe Zwilich (b. 1939, Florida USA)

After graduating from Florida State University in 1960, Ellen Taffee Zwilich moved from to New York in order to play with the American Symphony Orchestra under Leopold Stokowski.

She then joined the Juilliard School, where upon her 1975 graduation, as the first woman to achieve their Doctorate of Musical Arts in Composition, she gained some prominence by having her Symposium for Orchestra programmed, with the Juilliard Symphony Orchestra, by Pierre Boulez.

In 1983, with her first symphony, Zwilich became the first woman to win the Pulitzer Prize for music.

Selected works for full orchestra:

Libby Larsen
Photo © Ann Marsden
Libby Larsen (b. 1950, Delaware USA)

A graduate of the University of Minnesota, Libby Larsen was in 1983 appointed one of the Minnesota Orchestra's two composers-in-residence, making her the first woman to serve as a resident composer with a major orchestra.

Her works exhibit a great mixture of influences, from the Gregorian Chant sung by the St. Joseph of Carondelet nuns at Christ the King School as a young child, through her mother's boogie-woogie records, her father's Dixieland band (was an amateur clarinetist), and the many different styles of repertoire introduced to her by Sister Colette, her first piano teacher, to the eclectic direct influences of her college teachers.

Asked about her teachers and influences, she has said "To tell the truth, my teachers have come to me from unexpected places in my musical life. They have been poets, architects, painters and philosophers. The other way I really learn is by reading scores voraciously, from Chuck Berry to Witold Lutoslawski."

Her awards include a Grammy, a Clarion, two Honorary Doctorates, and a George Peabody Medal, among many others.

Selected works for full orchestra:

All information and pictures courtesy of Wikipedia, unless otherwise noted.

Friday, 20 June 2014

Unit Testing Is Dead

Haskell logo (Wikipedia)
Don't Bury The Lede!

Fair enough. Here's the power point takeaway:
  • The future present is multicore!
  • Only functional programming languages (Haskell and Erlang for the purist, but also Scala, Ocaml, F#) can scale adequately to cope with this future present.
  • Functional software design eschews mutable state, being purely procedural and "static".
  • Objects and interfaces (and O-O generally) are obsolete.
  • Unit testing, as we used to know it, is dead! Yeah! And TDD/BDD too! Yeah!
  • But we still have to support our legacy O-O systems with unit tests...
  • Here's how to do that without jettisoning statics.

LINQ The Hero

The introduction of Language-INtegrated Query (LINQ) into the C# language, with C# 3.0 in November 2007, headlined an impressive list of new features. But in truth, there was only one major, new feature delivered in that compiler release. Virtually everything else, with the possible exception of Automatic Properties, was introduced simply to underpin and enable the great LINQ:
  • Anonymous Types
  • Local Variable Type Inference
  • Object and Collection Initializers
  • Lambda Expressions and Trees
  • Extension and Partial Methods
Some examples of these dependencies:
  1. Local Variable Type Inference is essential when returning values of an Anonymous Type from a query.
  2. Lambda Expressions are required to enable the writing of sufficiently general SQL WHERE clause predicates.
  3. Extension Methods provide the backbone of the "fluent" (method chaining) syntax, upon which the Query Comprehension (using SQL-like keywords) is just compiler syntactic sugar.
Naturally, most of these supporting features have found immediate application in multiple other areas. Extension Methods in particular have spawned an entire vocabulary of Fluent APIs (of which my favourite has always been Bertrand Le Roy's FluentPath). These are popular with developers and library code consumers alike, being in the words of TechPro's Khalid Abuhakmeha fun and discoverable way to allow fellow developers to access functionality.

Villain Of The Piece

But with great power comes, as they say, great heatsinks. And coolest in their response to the proliferation of these extensions, implemented as they are throughout C# using static methods, are the unit test evangelistas. Their point is simple and well-made:
  • Unit testing involves rewiring your dependencies using mocks or "friendlies" which replace those real dependencies for test purposes.
  • Static methods lead to an essentially procedural programming environment, with code and data separated, and without clear objects or interfaces available to be swapped out and substituted.
So much the worse for static methods, they say. To which I rejoin, so much the worse for your unit testing framework! Not all such tools have intractable bother with statics.


Microsoft's Pex and Moles VS2010 power tools, and their VS2012 replacement Fakes Framework (via Shims, though not Stubs), can handle statics reasonably well.


The Typemock Isolator can control the behavior of static methods, just like any other method:
  .WhenCalled(() => MessageBox.Show("ignored arg"))
So, your test might look like this:
public void TestStaticClass()
  Isolate.WhenCalled(() => UserNotification.SomeMethod()).WillReturn(5);
  Assert.AreEqual(5, UserNotification.SomeMethod());
Telerik JustMock

JustMock provides for unrestricted mocking of dependent objects, including non-virtual methods, sealed classes, static methods and classes, as well as non-public members and types. Mocking of properties like get calls, indexers and set operations is also supported. JustMock also supports mocking of all classes and methods included in the MSCorlib assembly.

Don't Meddle With The IL?

Some of these solutions engender suspicion because of their under-the-hood behaviour. Specifically, there is concern that anything rewriting the actual Intermediate Language (IL) generated by the compiler, for consumption by the jitter, must result in something other than the official written code being tested. But this is an unjustified worry for several reasons.
  • By its very nature, IL is not the code that's finally executed on the end user's processor. What does the jitter do, but transform that IL into something entirely new?
  • Several .NET components cause new IL to be generated at run time. For example, Regex patterns which are not precompiled cause their own custom assemblies to be generated each time they are evaluated.
  • Visual Studio design mode is the biggest IL simulator of them all. Just ask yourself, how does it run the constructor for your new user control in design mode, when you haven't even finished typing it in yet, never mind compiling it?!
In short, these Shimmying frameworks are thoughtfully designed and quite serviceable, and aren't doing anything outlandish that you're not already relying on to a great extent.

Further Reading

Statics and Testability

Miško Hevery, Russ Ruffer and Jonathan Wolter's Guide to Writing Testable Code (November 2008) lists warning signs related to the four most popular flaws in O-O Design. (Google)

Miško Hevery, Static Methods are Death to Testability (December 2008) goes into more detail and answers commentators' concerns with the previous document.

Introductions to Functional Programming

Learn You a Haskell for Great Good!

This is the world's best tutorial introduction to the world's best programming language.

Learn You Some Erlang for Great Good!

This is the world's best tutorial introduction to the world's second best programming language.

Tuesday, 4 March 2014

Book Review: "Threat Modeling: Designing For Security"

Ben Rothke reviews Adam Shostack's new book:
"When it comes to measuring and communicating threats, perhaps the most ineffective example in recent memory was the Homeland Security Advisory System; which was a color-coded terrorism threat advisory scale. The system was rushed into use and its output of colors was not clear or intuitive. What exactly was the difference between levels such as high, guarded and elevated? From a threat perspective, which color was more severe — yellow or orange? Former DHS chairman Janet Napolitano even admitted that the color-coded system presented 'little practical information' to the public. While the DHS has never really provided meaningful threat levels, in Threat Modeling: Designing for Security, author Adam Shostack has done a remarkable job in detailing an approach that is both achievable and functional. More importantly, he details a system where organizations can obtain meaningful and actionable information, rather than vague color charts."
Full review:
Adam Shostack
Threat Modeling: Designing for Security
John Wiley & Sons
17 February 2014
ISBN-10: 1118809998
ISBN-13: 978-1118809990

Sunday, 29 September 2013

Natural Sort Order

Ten Before Two But 10 After 2

A recent update to our flagship product takes advantage of the server-side paging capabilities of the DevExpress Grid control. Naturally, this change has involved the migration of much client-side C# code into server-side table changes, triggers, and stored procedures, all written in SQL. One of the casualties was a particularly hirsute C# method which used to sort table contents into what's sometimes called "Natural Order", so that e.g. DC10 would come after DC9, rather than the naive collation or ASCII order, which would have them reversed.

For reasons unknown it fell to me to implement our Natural Sort. Easy I thought, I'll ask Google for the answer. That's when I discovered there's really no such thing as The Natutral Sort Order. It depends on your particular data, and your specific requirements, to a frankly surprising degree. Of course, there is plenty of material to choose from on the web. Jeff Atwood's Coding Horror on the subject is quite a good central station for your exploration of the matter. But this plenitude is also a bit of a problem. After an hour or two of research, I'd decided my best plan was to design and implement my own, newly reinvented wheel.


The basic approach is to partition or "stripe" the input (unsorted) data into a sequence of fields, alternating alphabetical content with numerical, and then treat each field as its own Sort By column - sorting the alpha fields as normal, and the numeric fields numerically. In the above DC9 / DC10 example, this results in an initial alpha field containing "DC" in both cases, followed by a numeric field containing the integers 9 and 10, which are then subjected to a numerical sort.

Some of the examples I'd read performed this latter sort by actually converting the input data field into an integer, then using the language's numeric comparison capabilities. I didn't want to use that approach, because a 32-bit signed integer can only be used for field sizes up to 9, a 64-bit one 18, and so on. I had no specification to assure me customer data wouldn't exceed such an arbitrary bound, so I fell back on the old workaround of keeping the numeric fields in string form, but left-padding them with zeros until all the rows were the same length, allowing an alpha sort to do the job of a numeric one. This is essentially part of what we do when we adopt the ANSI date format, e.g. 2013-09-29, to get correctly ranked dates.


Notice that it doesn't matter which padding character you use to right-align the numeric fields, just as long as it doesn't come after 0 (zero) in the collation order. This is important later.

One fun fact I found while researching collations was that aa comes after ab in the Danish / Norwegian collation order. For historical-typographical reasons, aa is treated (while sorting) as identical to the letter å, which just happens to be the last letter of their shared alphabet. Never mind; we'll just have to assume anyone using a particular collation order knows what they're doing, and won't be surprised by results which after all should in all cases appear perfectly non-anomalous to them.

Field Sizes

Okay, so now we have this requirement to right-justify our numeric fields by left-padding them with e.g. zeros. What fixed field size should we pad them out to? Well our input data, being stored in a traditional, relational database, has some particular maximum string length. In my case that length was 200. There's nothing to stop our customers filling every single character space with a decimal digit. So we could be looking at a numeric field of width 200.

What about the alpha fields? These don't require padding, since standard left-to-right string comparisons work fine for them. But note that every alpha character "uses up" one position in the input data, rendering that location unavailable for storing a subsequent digit. Long story short, we can stuff any alpha prefix into the left edge of our 200-character field, and still have enough room to fit and pad out the remaining, right-justified, numeric content.

For performance reasons, the ultimate destination for this calculation was a "live" Sort Order column in the relevant database table, rather than a UDF call at SQL SELECT time. That's why my buffer size had to be allocated statically, rather than optimised with reference to the worst case requirements of the data actually in the table; we didn't want a new row of data invalidating the precomputed sort orders of the old data.

Islands and Seas

You might worry about non-digit characters encroaching on the padding space of our numeric fields, and you'd be right to. Actually everything works okay as long as we stick to letters and digits. Anomalies can start to appear when we introduce punctuation characters, especially when using ASCII collation. The numeric digits and the upper- and lower-case letters can be seen to form three disconnected "islands" of codes, surrounded by four seas of punctuation characters.

In practice, these anomalies are mitigated by our customers' sparse use of such punctuation, and tendency to apply it consistently whenever it is used at all. As a further mitigation, I changed the padding character from a zero digit to a space, ensuring that padded-out numeric fields are essentially guaranteed to sort lower than any alpha or other character found in the same region.


The following, correctly sorted data can be seen to illustrate these adjustments. Notice, in the Natural Sort column, the use of the space character as filler, and the fact that the 'h' of 'Coach' occupies the same character column position as the '1' of 'Van 1234' without causing any problem:

Simple Sort   Natural Sort

 Coach 12      Coach  2
 Coach 2       Coach 12
 Van 1234      Van  234
 Van 234       Van 1234

Field Count

Obviously the input data might contain further alternating fields of non-numeric and numeric data. What should we do with subsequent fields? Well, we just package them up in pairs using exactly the same algorithm as the first, and when we have done this enough times to consume all of the input, return the concatenation of all these field blocks as the sortable key.

There is another minor optimisation available here. Obviously the first field pair must have consumed at least one character from the input. This means that the field "buffer" for the second pair can be made one character shorter than the first - say, 199 characters instead of 200. Likewise, if input characters remain after the second or subsequent field pair have been extracted, then that pair must have consumed at least two characters, so the next buffer size can be 197, or 195, or 193, or...

Yes, quite. The law of diminishing returns cuts in quite promptly here, especially since we decided that a total of 3 field pairs would be more than adequate for our customers' requirements (actually there is an auto-generation element in our product, designed to nudge them into using just a single field pair: an alpha prefix, followed by a numerical index). So I just left all my buffers at width 200. You should obviously make use of this optimisation if your input size limit is much lower than 200, or if you decide to use a lot more significant field pairs.

Coding and Testing

This works adequately and was ultimately accepted for production, but I must acknowledge here the excellent work done by our Test Department, both in testing my new UDF out-of-sequence when I asked - quite unreasonably - for an early comparison with the old client-side C# function (come to think of it, how the hell did they even do that?), and also in uncovering pretty quickly all of the corner cases - punctuation, padding characters - mentioned above.

Oh, the code? Yeah sure, here ya go. Should be quite easy to hunt down the few rogue "200"s to adapt it for your use. You might also want to limit the number of iterations to prevent DOS attacks (we use only 3). As I said at the outset, it's still surprisingly specific and might not work for you. For example, it doesn't recognise decimal points / numeric separators. The truth is, there simply does not exist one Natural Sort Algorithm suitable for all data.

CREATE FUNCTION [dbo].[NaturalSort](@input NVARCHAR(200))
    @count INT = LEN(@input),
    @result NVARCHAR(MAX) = '',
    @p INT = 1, @q INT, @x INT, @y INT
  WHILE @p <= @count
    SELECT @x = PATINDEX('%[0-9]%', SUBSTRING(@input, @p, @count) + '0') - 1
    SELECT @q = @p + @x
    SELECT @y = PATINDEX('%[^0-9]%', SUBSTRING(@input, @q, @count) + '!') - 1
    SELECT @result = @result + SUBSTRING(@input, @p, @x) +
      REPLICATE(' ', 200 - @x - @y) + SUBSTRING(@input, @q, @y)
    SELECT @p = @q + @y
  RETURN @result