Now you'd thing what the hell am I rambling about now, but here's the thing. Reading through my daily dose of programming news in Google Reader I stumbled upon this clever algorithm - the Damerau-Levenshtein distance algorithm. It is used to find the number of steps it would take to transform one string into another using the following operations: insertion, deletion, substitution and trasposition. Or put simply - Damerau-Levenshtein distance is a metric to defining the difference between two strings. The main areas of use are spell-checking (typos etc), DNA difference calculation etc. So I read up on the algorith at the Wiki page and decided to implement it just for the implementation's sake. Having fiddled around with the code for some time it dawned on me - I have this OCD when I perform the exact same operation in my head for two words that have a revelance to whichever situation I happen to be in at a given moment. I also provide a corresponding String extension method. Oh and if anyone hasn't picked it up yet, I mainly write code in C# 3.0 as is the case with this post. So without further ado, I present to you the implementation:
public static class DamerauLevenshtein { public static int DamerauLevenshteinDistanceTo(this string @string, string targetString) { return DamerauLevenshteinDistance(@string, targetString); } public static int DamerauLevenshteinDistance(string string1, string string2) { if (String.IsNullOrEmpty(string1)) { if (!String.IsNullOrEmpty(string2)) return string2.Length; return 0; } if (String.IsNullOrEmpty(string2)) { if (!String.IsNullOrEmpty(string1)) return string1.Length; return 0; } int length1 = string1.Length; int length2 = string2.Length; int[,] d = new int[length1 + 1, length2 + 1]; int cost, del, ins, sub; for (int i = 0; i <= d.GetUpperBound(0); i++) d[i, 0] = i; for (int i = 0; i <= d.GetUpperBound(1); i++) d[0, i] = i; for (int i = 1; i <= d.GetUpperBound(0); i++) { for (int j = 1; j <= d.GetUpperBound(1); j++) { if (string1[i - 1] == string2[j - 1]) cost = 0; else cost = 1; del = d[i - 1, j] + 1; ins = d[i, j - 1] + 1; sub = d[i - 1, j - 1] + cost; d[i, j] = Math.Min(del, Math.Min(ins, sub)); if (i > 1 && j > 1 && string1[i - 1] == string2[j - 2] && string1[i - 2] == string2[j - 1]) d[i, j] = Math.Min(d[i, j], d[i - 2, j - 2] + cost); } } return d[d.GetUpperBound(0), d.GetUpperBound(1)]; } }
What do you guys think, should the presentation of interesting algorithm finds in my blog become a recurring subject? I think it wouldn't hurt.
Mihkel