I was processing a number of Visual Studio files. A large-ish number of C# solution, project, and source files. Looking for any obsolete (or otherwise redundant) .cs file which wasn't included in a .csproj - or worse, any .csproj that wasn't explicitly built by a .sln. Now, some of these projects referred to files not in the directory subtree of the project file itself. Also, some such references employed relative paths.
For example, if the project file at
d:\project\client\startup.csprojrefers to the source file at
d:\project\engine\utils.csthen it's likely to do so using two dots; i.e., under the guise of
..\engine\utils.csObviously, appending this relative path to the client directory path
d:\project\client\yields a perfectly serviceable, though not optimal, source file path:
d:\project\client\..\engine\utils.csTrouble is, my earlier directory traversal, building up the dictionary of source files, created the entry for this file using the "key":
d:\project\engine\utils.csSo in order to normalize keys, I needed to remove the \client\.. portion of the calculated path, making these two identical. Sounds like an ideal job for a wee bit of simple Regex, no? Just remove all instances of a backslash, followed by a bunch of non-backslash characters (the redundant directory name), followed by another backslash and two periods:
return Regex.Replace(input, @"\\[^\\]+\\\.\.", string.Empty); // Bug!No!
I puzzled for a little while over why the output of this function still contained embedded \..\, before finally adding some debug code and discovering the answer.
d:\project\client\..\version.csWhy hasn't the above code removed \client\.. from this path? Because \client\.. wasn't in the original input string. When it does appear, the Regex.Replace operation (having just removed \forms\..) has already moved beyond that position in processing the input string. And it does not backtrack. Can't afford to, that would be mayhem. Imagine it trying to replace "x" by "xx" with backtracking.
~(∀x)(x is a nail)
I could work around the bug, say by detecting whether any change has been made to the input string, and if so (or rather, while so) re-running the operation. That's clearly inefficient. No, the best thing to do here is accept I've selected the wrong tool for the job - a conclusion you'll reach fairly often with Regex!
Here finally is the smart approach to this problem:
return System.IO.Path.GetFullPath(input); // Fixed.