The classical Levenshtein distance metric allows for the comparison between any two arbitrary strings. The "edit distance" measures how many additions, substitions, or deletions are needed to convert one string into another. While this is a powerful way to compare strings, it does have its limitations.
The Damerau-Levenshtein distance is a little more robust, in that it includes transpositions/permutations as one way to manipulate a string. For example, "Hello" and "Helol" would have a D-L distance of 1, instead of 2, because it is possible to transpose the "o" and the "l".
This library includes a lightweight D-L distance calculator, written in C#, which can take in two strings and output a distance between these two strings.
Because many use-cases involve selecting an individual string from a collection of strings, this library also includes an activity which takes in a collection of strings and a chosen string, and it outputs the string from the original collection that matches most to the chosen string.
This activity is meant to be used in workflows that include tokenization of texts, such that one token needs to be extracted from a larger body.