Sunday 25 April 2021

Lyric Captions

Praise the Lord and Pass the Microphone

The Digital Audio Workstation "Ableton Live" works almost as well with full video files as it does with audio samples, but has no native facility to add lyric captions to a musical video, nor to generate the necessary caption format files to upload to video services like YouTube. Having produced a full square sixteen of Paul O'Brien's song recordings, and in the process almost accidentally created an equal number of "live" performance videos, I wanted the ability to add closed captions containing the song lyrics.

Oh, and it had to be free and quick and easy. There are commercial solutions available, but I didn't want to spend one cent. If you search the intertubes for "Ableton Lyrics" today, most of your results will be from American "worship sites", and a little thought will reveal the reason for that. Obviously this is a bit removed from my particular use case.

There are also automatic options. The AIs may be coming for all of our jobs, and speech recognition is certainly improving exponentially as we, erm, speak. But it's not quite there yet as far as the singing voice is concerned. I mean, just look at this effort.

So it has to be accurate too, but within limits. We're not building a karaoke machine complete with bouncing ball. One-second accuracy should be adequate for the display of each line of the lyrics, so the listener can follow along with the performance.

SubRip File Format

Most subtitles distributed on the internet, for example those ripped from movie DVDs, use a file format called - for obvious reasons - SubRip. Since this format is one of the two most popular currently supported for videos uploaded to YouTube or Google Drive (the other being SubViewer, support for which was added later) I settled on it initially for this project.

SubRip is a very simple text format: each text entity (line of dialog or song lyric) is preceded by a header containing its index (line) number and the start and stop times for its display on screen, and followed by a blank line. Obviously any text editor could be used to produce such a file, but look how fiddly it is, even after you've determined the full list of correct values, to incorporate these time marks into the format (the milliseconds separator is a comma because SubRip was originally written in France):

"Union Card" song lyrics - Copyright © 2021 Paul O'Brien

That's the little app I ended up with, after two hours in Visual Studio - one hour for the calculation engine, and one for the user interface. Here's the code, and here's how it works: 

  1. Specify the total duration of the video file, by either entering the minutes & seconds at the foot of the form, or selecting the video or associated audio file (via menu or drag & drop) and letting the code read the relevant duration from it. This feature uses the magic of TagLibSharp.
  2. Either drag the lyrics file into the window, or paste the lyrics from the clipboard into the left panel, or right-click and select the lyrics text file to load it.
  3. The captions file appears immediately in the right panel. This text may be copied to the clipboard, or saved with a menu command.
  4. Any alterations to the duration controls, or to the contents of the left lyrics panel, are immediately reflected in the right captions panel, so it's always kept up to date, ready to be copied or saved.

Tweaking the Timing

Given the above description of the tool's operation, you probably guessed that it's simply counting the number of lines in the lyrics, and allocating an equal time slice to each out of the total video duration. Sure, this isn't exactly how songs work, and without some degree of tweaking, the lyrics displayed will drift into and out of synchrony with the performance - that's if you're lucky, and they ever enter synchrony at all!

The one blunt weapon at our disposal is the blank line. It's usually enough to restore an adequate level of synchrony, without introducing complicated user operations, judiciously to insert one or more blank lines into the lyrics. For example, the above song Union Card has a classic 12-bar blues structure. If you don't know what that is, think of Led Zeppelin's Rock And Roll. And if you don't know what that it, get off of my lawn.

Union Card has a 4-bar instrumental introduction, during which we don't want any lyrics appearing, although we could use this to add the artist's name, song title, copyright notice etc. Assuming we don't want any of that, we just observe that each "line" of the song lyrics occupies two bars, and add two blank lines to the start of the lyrics to account for those four wordless bars.

Next, observe that here - as often in the 12-bar blues format - the first six bars of a verse are occupied by the first three lines of lyrics; the next two bars are instrumental; the next two hold the fourth "punch" line of the verse; and the last two bars are instrumental once again. Following our guide of one line of text equalling two bars, we see that inserting one blank line after the third and fourth line of each verse should align things very nicely. When my app sees a blank line, it just retains the previously displayed line of text, because why not? There's no advantage in blanking it. Incidentally if you do want to insert a blank line somewhere, just use a line containing only a backslash ('\') instead, and the program will oblige.

But wait - a glance at the Ableton project reveals there's actually a 5-bar outro after the 9th and final verse. If one blank line represents two bars, how can we add half a blank line to compensate for the final, odd-numbered bar? Well, we can't, at least not without complicating our beautifully simple timing scheme. Easier maybe just to add two blank lines, and truncate the final bar. Looking again at the score we see the tempo is 96bpm, the time signature is 4/4, so one bar is 4/96 minutes, or 2½ seconds. So, just clip 2½ seconds from the file duration using the up/down controls at the foot of the form.

Och That's Too Complicated

Okay, how about this then. You can add a half-length line by including an initial period ('.') in the lyrics. If this appears on a line on its own, it's equivalent to a blank line, but of just half the usual duration. If it appears at the start of a lyric line, then that line will occupy just one-half of the usual time for a line; so for example, if each line of lyrics so far has occupied two bars of music, this one will occupy just one bar.

Inspired by musical notation, I'll expand this a little further. So, a line starting with two consecutive periods ('..') will occupy a further 50% of the duration of the single period line, i.e. three quarters or 75% of the usual line length; while a line starting with a colon (':') will occupy just one quarter.

These markups can alternatively be appended to the end of a line, extending its duration by the given amount, so for example a period at the end of a line causes it to be displayed for 1½ times the usual interval, a colon 1¼, and so on. With a little thought, this is almost identical in effect to putting the punctuation on its own (otherwise blank) line, after the lyric. The "almost" covers the fact these trailing marks will be ignored if leading marks are also present.

Handy reminder from the Help menu or F1 key

But what if my lyrics... end with a haunting ellipsis? If you want to incorporate leading or trailing punctuation in the displayed text of a particular lyric line, no problem, just pad the text with a leading or trailing space, so that your punctuation symbols don't actually appear right at the very start or end of the line. The program strips all leading and trailing markup and whitespace, before adding a single space for legibility to the start and end of each line, so your trailing ellipsis will be preserved without altering the line's display duration.

More general extensions are possible, but I'll reserve those until the need arises. Paul's written some songs in 3:4 time, so that shouldn't be too far in the future.


Here's the final result. Note how it changes text precisely on the first beat of the bar throughout. A little distracting of course when Paul's singing anticipates this point, but that's by design, and it's doing just what I asked. Precisely positioned lyric captions with the absolute minimum of time, cost, effort and fuss.

No comments:

Post a Comment