Thursday 27 September 2012

Prokofiev, Stravinsky, Shostakovich

Prokofiev (Soviet stamp, 1991 centenary)
Building a Collection

If the key to building up a coherent collection of classical music is to have some kind of structure, some skeleton of dry bones on which to hang the meat, then Project Listen To A Crapload Of Symphonies, which started out as my Symphonic listening project before swelling to incorporate concerti, symphonic (tone) poems and ballets, surely qualifies. From an origin point provided by The 100 Greatest Classical Symphonies at Digital Dream Door (which I'll call D3), this bonework has mushroomed, like a badly mixed metaphor, to include at last count over 350 works of enormous merit. Having now completed its genesis of hyper inflation, the list has moved into an era of sustained, steady, but perceptibly accelerating growth.

Of course it still has boundaries. I'm pretty sure I've included nearly all the great symphonists who will ever make it into the list. Though many gifted, modern composers still till the croft, these practitioners are no longer symphonic specialists¹. Similar remarks apply to the concerto. Tone (or symphonic) poems, despite thriving more in modern music, are still limited by the general unpopularity of the genre², ballets even more so. And I don't intend to add any opera to it; my primary interest is instrumental music, not songs.

Stravinsky (Ukrainian stamp, 2007)
The Three Russians Expansion Policy

No, the main area of feature creep is completism - the gradual incorporation of a composer's entire repertoire. Mozart write over 50 symphonies, Haydn more than 100; quite often, it is instructive to consider relationships, similarities and differences between the individual opera (in the sense: plural of opus) of a given maestro. Particularly when the composer and his long form works are already among your favourites at the very outset. So it is for me, with the great Russian triumvirate of Sergei Sergeyevich Prokofiev, Igor Fyodorovich Stravinsky, and Dmitri Dmitriyevich Shostakovich.

Three out of seven Prokofiev symphonies (ignoring revisions) were included in the original D3 list. Two out of four Stravinskys. And six out of fifteen Shostakoviches in the top 120, although an additional two are found bubbling under as the BRMB used to say. That's just not enough! Yet where are we to insert the others, and who or what has to be thrown out to make room for them? Worse still, what about the other composers already on the list, whose excluded works we are still to hear? And worst of all, what of composers not yet listed? Can we be sure that our favourites merit precedence above all of these?

Shostakovich (Russian stamp, 2000)
The Need to Compromise

Obviously not without having already heard every work, both on and off the list, and compared each to every other. But the whole purpose of the list was just to start that very process!

Xорошо, let all additions occur at the bottom of the list until we get a feel for it. And anyway, I still don't feel confident rating even the best known symphonies against each other. I still think Beethoven's sixth (Pastorale) is the best thing since baked wheat, yet it's only at number seven. Ho-hum.

Devil take HTML table editing; I'll maintain the symphony list in a Google Docs spreadsheet here. Later I'll add the concerti, etc. Update: done.

Prokofiev

Wikipedia describes Prokofiev as an iconoclastic composer-pianist, a notoriously, ferociously dissonant virtuoso, who after the revolution left Russia for USA and then Europe. There in 1936, increasing economic deprivation prompted a return to Russia, where in response to the 1941 Nazi invasion, he wrote the opera War and Peace. In 1948 his "anti-democratic formalism" saw his income severely curtailed, and he was forced to compose Stalinist works.

Stravinsky

With perhaps the least politically troubled life³ of our three Russian heroes, Stravinsky too lived in Europe (France) and USA (West Hollywood) at different times. He had inexhaustible desire to explore and learn about art, literature and life, and enjoyed many high profile collaborations, particularly while living in Paris. It was in the 1950s that he began to experiment with Schoenberg's tone rows in his compositions.

Shostakovich

Both Prokofiev and Stravinsky, as well as Gustav Mahler, were initially strong influences on Shostakovich, who subsequently developed his own "hybrid" style, easily fusing post-romantic elements with the neo-classical. In later life, chronic ill health, including polio and several heart attacks, permeated his works with a sense of mortality. After suffering a series of falls, he wrote jokingly:
Target achieved so far: 75% (right leg broken, left leg broken, right hand defective. All I need to do now is wreck the left hand and then 100% of my extremities will be out of order.
Well, perhaps he did, but then found himself unable to write about it.

¹ Shostakovich, 1906-1975, has been called The last great symphonist.
² Classical music, by de facto definition, is not popular music.
³ Although he was famously threatened with a $100 fine for adding an unconventional major seventh chord to The Star Spangled Banner!

Tuesday 25 September 2012

Windows 8 Bootkit

UEFI Technology: Say Hello to the Windows 8 Bootkit!

Writing a bootkit couldn't be an easier task for virus writers with the UEFI framework available, much easier than before when they needed to code in pure assembly.
ITSEC director Marco Giuliani sounds less than impressed by the security of the Windows 8 kernel, specifically its porting of the legacy BIOS firmware and Master Boot Record (MBR) into the new Unified Extensible Firmware Interface (UEFI), first fully supported by Microsoft in 64-bit Windows 7. Here he is referring to the fact that UEFI provides a C development environment option, whereas assembly language skills were mandatory for VXers in BIOS days.
http://www.itsec.it/2012/09/18/uefi-technology-say-hello-to-the-windows-8-bootkit/
This isn't the first Windows 8 bootkit to emerge. Last year, Vienna-based Peter Kleissner's Stoned and Stoned Lite proved the concept of loading boot malware from a USB or CD drive on older machines, However these kits didn't circumvent the UEFI. Now this has been shown to be trivial, the only remaining line of defence is to enable SecureBoot by default - an option which many critics complain could limit or even prevent the installation of such alternatives as Linux and FreeBSD.

Thursday 20 September 2012

Simple Regex #6½: More Lazy Quantifiers

Hush. Hush.

The security-related components of my work continue to comprise only product-specific threats and mitigation, which means I can't exactly blog about them in a public forum like this one. Instead, here's a little more on the subject of that previous application of Regular Expressions to music catalogues.

Oh and about that previous article, I have to be honest and say that I've been getting complaints! Apparently the level of explanation offered wasn't even up to my usual low standards of lucidity? Let's try to rectify that here. The goal you'll remember was to parse a list of classical music pieces like this,
49. Violin Concerto in E major, RV271 "L'amoroso" - Antonio Vivaldi
subject to the proviso that while an item's rank (here equal to 49), title (Violin Concerto) and composer name (Antonio Vivaldi) are all mandatory, the key (E major), opus/catalogue number (RV271) and nickname (L'amoroso) are all optional. Here's an analysis of the Regex pattern I'm using to split these records into fields:
private const string rank = @"(\d+)\.";
private const string title = @" (.+?)";
private const string key = @"(?: in ([A-G](?: flat| sharp)?(?: major| minor)?))?";
private const string number = @"(?:, (.+?))?";
private const string nickname = @"(?: ""(.+)"")?";
private const string composer = @" - (.+)";
private const string pattern = rank + title + key + number + nickname + composer;
The @ symbol is an artifact of the C# language. Most of its appearances above are redundant, but regardless, I do tend to use it habitually when working with Regex. It saves having to double all backslashes. So the first line
private const string rank = @"(\d+)\.";
matches one or more decimal digits (followed by a literal period, which is outside the capturing parentheses, and so doesn't itself get included in the captured group). The second line
private const string title = @" (.+?)";
matches a space (again excluded, being outside the group) followed by one or more characters of the title, but matching as few characters as possible consistent with an overall successful match.

It helps to apply these two visual filters when inspecting the various groups in patterns like the third line above:
(?: starts a non-capturing group;
)? ends an optional group.
So for example, the overall key group pattern above is both non-capturing, since it starts with (?:, and optional, since it ends with )?. Nested within it is the main capturing group (labelled c1 in the expanded analysis below) for the key text, and nested in turn within that are two further, non-capturing, optional groups, n1 and n2:
private const string n1 = @"(?: flat| sharp)?";
private const string n2 = @"(?: major| minor)?";
private const string c1 = @"([A-G]" + n1 + n2 + ")";
private const string key = @"(?: in " + c1 + ")?";
More Music Maestro!

Hopefully that's as much analysis as we need for this pattern. It's a little more complex than previously, because of the addition of this opus/catalogue number field, appearing when I generalised the listening project, originally featuring just symphonies, to include also the concerti, symphonic poems and ballets in the following lists:
That should be enough to keep HMV in business for a few more weeks! I wanted the project to include all our classical favourites, so piano concerti became a necessity (Grieg for me, Tchaikovsky #1 for the wife), as did tone poems (The Planets, for both of us) and ballets (Bolero). Anyway, the addition of opus/catalogue number field seemed like a good idea. And obviously being optional, it had to be added with yet a third lazy quantifier. Equally lazy is my detection of the number field's existence, which is triggered by the presence of a comma in the title (but after the key, if any). It would be possible to do more, since this field does have certain structure, although not as much as the key field. Noticing that only one record contained commas in the actual title,
106. Symphony No. 14 for soprano, bass, strings, and percussion – Dmitri Shostakovich
I just deleted them. Well sometimes, and particularly with Regex, the best solution is to get a life.

Wednesday 5 September 2012

Simple Regex #6: Lazy Quantifiers

Dedicated Listener

I've decided to listen to 129 classical symphonies and rate them against each other, to try to understand this musical format a little better. The project has two immediate side effects of a technical nature. I've had to:
  1. Replace the 32GB SDHC card on my phone MP3 player, a venerable Sony Ericsson Xperia X10 Mini, with the bleeding edge Sandisk 64GB SDXC, to make room for a shedload of symphonies; and
  2. Brush up on greedy quantifiers, and their lazier, more reluctant counterparts.
The former was a pleasant surprise. The phone manufacturer's specifications have always claimed microSDHC compatibility "up to 16GB", but I'd been using a 32GB card ever since the X10 Mini got shunted out of my wife's handbag by the arrival of a bouncing new baby Galaxy. Still, the jump to purchase the XC card was a bit of a leap of faith, since nobody else online appeared to have tried this particular combo. Sadly, it worked! Yes sadly, because otherwise, I'd have had a great case for a new phone...

The latter side effect occurs because I've arrived at my list of 129 symphonies under test by combining the main table of 120 listed at Digital Dream Door,
http://www.digitaldreamdoor.com/pages/best-classic-symp.html
with a further 9 also-rans suggested by that forum's moderator, and compiler of the main list, whom we shall call Brian (for that is his name). Having screen scraped his raw data, converting all its en–dashes and “directional quotes” to hy-phens and "neutrals", I was left with the task of parsing these results. Here's a small sample:
 1. Symphony No. 9 in D minor "Choral" - Ludwig Van Beethoven
 2. Symphony No. 5 in C minor - Ludwig Van Beethoven
11. Symphonie Fantastique - Hector Berlioz
63. Symphony No. 3 "The Camp Meeting" - Charles Ives
For analysis, I need to extract certain data from these entries. The first, dot-terminated field is Brian's initial rank. This is followed by the full title of the symphony, which usually includes its performance key, and sometimes a popular alternate title or nickname. Finally, a hyphen sets off the composer name.

Now, it would have been easy to take multiple passes at this task, first removing any optional (key and nickname) fields present, then splitting the remaining text into the mandatory rank, root title and composer name fields. Too easy, in fact. Why do that, when we could just use a single Regex pattern to simultaneously extract the full compliment of fields, mandatory and optional?

Too Greedy

The difficulty is that the pattern capturing the symphony title could also gobble up any key and/or nickname information that might be present. Let's say for simplicity that the key field can be identified by the word in, while the nickname is delimited by double quotation marks. That gives us two optional groups:
(?: in (.+?))?
(".+")?
where we've nested the capturing key group (underlined) in a non-capturing group (?: )? in order to exclude the in keyword itself from the captured data. Let's try dropping these into our first attempt at a mandatory field template:
(\d+)\. (.+)(?: in (.+?))?(".+")? - (.+)
This succeeds in extracting the leftmost rank and rightmost composer name fields correctly. However, there's no key or nickname information parsed out; all of that remains resolutely embedded in the title field. We say that the + quantifier in the title group (.+) is greedy. It gobbles up as many input characters as possible, backtracking only when necessary to find an overall pattern match. If it can devour the key and nickname information without breaking the match, as here it can because those groups are optional, then it will.

Dieting

Quantifiers like ?, *, + and {m,n} can be made less greedy by following them with a question mark. For example, the ? quantifier on its own matches zero or one occurrence(s) of the preceding expression, but with a preference for one. That's to say, it will gobble up input characters whenever it can. ?? also matches zero or one occurrence(s) of the expression immediately before it, but with a preference for zero. In general, these so-called lazy quantifiers consume as few input characters as possible, while still letting the overall pattern match succeed. Which turns out to be exactly what's needed in our case:
(\d+)\. (.+?)(?: in (.+?))?(".+")? - (.+)
Now the title group (.+?) consumes the minimum possible number of characters, meaning that if there are key or nickname fields present, the associated groups will now have a higher priority in extracting those from the input stream. Similarly, the identical key group (.+?) has itself been put on a diet, to stop it gobbling up any present nickname.

The final pattern that I used contained many further refinements, including named groups for readability, generalisation of white space, trimming of field contents, and stricter key field matching with a more specific group pattern,
[A-G](?: sharp| flat)?(?: major| minor)?
which actually removes the need for the second lazy quantifier above. Incidentally this also allows the word in to appear in the symphony's title in a non-key, non-nickname context - as for example in Igor Stravinsky's Symphony in Three Movements, which happens to be number 76 on Brian's list.

Further Reading, Listening

Greedy and lazy Regex quantifiers are explained in more detail at MSDN:
http://msdn.microsoft.com/en-us/library/3206d374.aspx
A good online tool for testing regular expressions containing multiple groups is Derek Slager's AJAX offering, A Better .NET Regular Expression Tester, which can be found here:
http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
As for the listening exercise, a quick survey of my dusty old Classic FM CDs reveals that I already have good recordings of 43 56 symphonies on Brian's list, including all of his top 20 25 (and quite a few duplicates; who authorised that?). So cost wise, I'm exactly one third almost half complete. Although to be honest, I've never really liked Solti's limp wristed walkthrough of Mozart's Jupiter...