MarketplaceStudioActivityString Similarity Calculation

Create your first automation in just a few minutes.Try Studio Web

String Similarity Calculation

String Similarity Calculation

by Patrick Flöß

StarStarStarStarStarStarStarStarStarStar

0

Activity

Downloads

<100

back button
back button
carouselImage0
next button
next button

Summary

Summary

Calculate the similarity of strings, e.g. Aa / Ab = 50%.

Overview

Overview

Calculates Levenshtein distance and puts it relative to the longest string, so you get a value between 0 and 1 that tells you how similar two strings are. You can use this whenever a 100% match is not feasible and a machine learning model is to expensive for the use case. 

I also added a normalization and sorting activity, that can be put upfront the string similarity calculation if you want to ignore any non-word characters and / or if you don't care in what order the strings are filled.

I use this to compare addresses from diffferent systems, as users from different systems often type in addresses kind of similar, but not exactly the same. A value above 0.7 is good enough for me, to validate whether addresses match. Every set of data needs to be analyzed separately. A lot of testing is helpful to find a good value. 

Activities Details

1. Compare two strings: The core feature is the comparison of two strings and the return of a similarity value. 

2. Remove Non-Word Characters: This feature involves eliminating any character from the text that doesn't contribute to its meaning. These may include punctuations, special symbols, numbers (unless they are relevant), and any other non-alphabetic characters. This process helps to simplify the text, making it easier to analyze and process.

3. Convert to Uppercase: This feature translates all the letters in the text to uppercase. It is a commonly used method for normalizing text data, ensuring consistency across different inputs. This function can be particularly useful when case sensitivity is not relevant to the application, ensuring the text can be evaluated uniformly.

4. Sort the String Alphanumeric: This feature involves organizing the content of a string in alphanumeric order. This is a type of sorting where both the numbers (0-9) and the letters (A-Z/a-z) in a string are arranged systematically.

First, all numbers are sorted in ascending order, followed by letters sorted in ascending order (typically, uppercase letters first, then lowercase). For example, the string 'b2a1C3' would become '123Cab'. Sorting the string alphanumeric can be beneficial in a variety of scenarios, such as enhancing data consistency, simplifying search and retrieval processes, and providing a standardized view of the data, especially when dealing with large amounts of text data.

Features

Features

  • Compare two strings
  • Normalize strings

Additional Information

Additional Information

Dependencies

UiPath.System.Activities >= 23.4.2

Code Language

Visual Basic

Runtime

Windows (.Net 5.0 or higher)

Publisher

Patrick Flöß

Visit publisher's page

License & Privacy

MIT

Privacy Terms

Technical

Version

1.0.3

Updated

June 2, 2023

Works with

Studio: 23.4.1+

Certification

Silver Certified

Tags

Support

UiPath Community Support