Calculates Levenshtein distance and puts it relative to the longest string, so you get a value between 0 and 1 that tells you how similar two strings are. You can use this whenever a 100% match is not feasible and a machine learning model is to expensive for the use case.
I also added a normalization and sorting activity, that can be put upfront the string similarity calculation if you want to ignore any non-word characters and / or if you don't care in what order the strings are filled.
I use this to compare addresses from diffferent systems, as users from different systems often type in addresses kind of similar, but not exactly the same. A value above 0.7 is good enough for me, to validate whether addresses match. Every set of data needs to be analyzed separately. A lot of testing is helpful to find a good value.
1. Compare two strings: The core feature is the comparison of two strings and the return of a similarity value.
2. Remove Non-Word Characters: This feature involves eliminating any character from the text that doesn't contribute to its meaning. These may include punctuations, special symbols, numbers (unless they are relevant), and any other non-alphabetic characters. This process helps to simplify the text, making it easier to analyze and process.
3. Convert to Uppercase: This feature translates all the letters in the text to uppercase. It is a commonly used method for normalizing text data, ensuring consistency across different inputs. This function can be particularly useful when case sensitivity is not relevant to the application, ensuring the text can be evaluated uniformly.
4. Sort the String Alphanumeric: This feature involves organizing the content of a string in alphanumeric order. This is a type of sorting where both the numbers (0-9) and the letters (A-Z/a-z) in a string are arranged systematically.
First, all numbers are sorted in ascending order, followed by letters sorted in ascending order (typically, uppercase letters first, then lowercase). For example, the string 'b2a1C3' would become '123Cab'. Sorting the string alphanumeric can be beneficial in a variety of scenarios, such as enhancing data consistency, simplifying search and retrieval processes, and providing a standardized view of the data, especially when dealing with large amounts of text data.