The security-related components of my work continue to comprise only product-specific threats and mitigation, which means I can't exactly blog about them in a public forum like this one. Instead, here's a little more on the subject of that previous application of Regular Expressions to music catalogues.
Oh and about that previous article, I have to be honest and say that I've been getting complaints! Apparently the level of explanation offered wasn't even up to my usual low standards of lucidity? Let's try to rectify that here. The goal you'll remember was to parse a list of classical music pieces like this,
49. Violin Concerto in E major, RV271 "L'amoroso" - Antonio Vivaldisubject to the proviso that while an item's rank (here equal to 49), title (Violin Concerto) and composer name (Antonio Vivaldi) are all mandatory, the key (E major), opus/catalogue number (RV271) and nickname (L'amoroso) are all optional. Here's an analysis of the Regex pattern I'm using to split these records into fields:
The @ symbol is an artifact of the C# language. Most of its appearances above are redundant, but regardless, I do tend to use it habitually when working with Regex. It saves having to double all backslashes. So the first lineprivate const string rank = @"(\d+)\."; private const string title = @" (.+?)"; private const string key = @"(?: in ([A-G](?: flat| sharp)?(?: major| minor)?))?"; private const string number = @"(?:, (.+?))?"; private const string nickname = @"(?: ""(.+)"")?"; private const string composer = @" - (.+)"; private const string pattern = rank + title + key + number + nickname + composer;
matches one or more decimal digits (followed by a literal period, which is outside the capturing parentheses, and so doesn't itself get included in the captured group). The second lineprivate const string rank = @"(\d+)\.";
matches a space (again excluded, being outside the group) followed by one or more characters of the title, but matching as few characters as possible consistent with an overall successful match.private const string title = @" (.+?)";
It helps to apply these two visual filters when inspecting the various groups in patterns like the third line above:
(?: starts a non-capturing group;So for example, the overall key group pattern above is both non-capturing, since it starts with (?:, and optional, since it ends with )?. Nested within it is the main capturing group (labelled c1 in the expanded analysis below) for the key text, and nested in turn within that are two further, non-capturing, optional groups, n1 and n2:
)? ends an optional group.
More Music Maestro!private const string n1 = @"(?: flat| sharp)?"; private const string n2 = @"(?: major| minor)?"; private const string c1 = @"([A-G]" + n1 + n2 + ")"; private const string key = @"(?: in " + c1 + ")?";
Hopefully that's as much analysis as we need for this pattern. It's a little more complex than previously, because of the addition of this opus/catalogue number field, appearing when I generalised the listening project, originally featuring just symphonies, to include also the concerti, symphonic poems and ballets in the following lists:
- Symphonies (129)
- Concerti (60 piano, 50 violin, 40 other)
- Tone Poems (50)
- Ballets (16)
106. Symphony No. 14 for soprano, bass, strings, and percussion – Dmitri ShostakovichI just deleted them. Well sometimes, and particularly with Regex, the best solution is to get a life.
No comments:
Post a Comment