Products

Studio

Studio Web

StudioX

Automation Ops

Document Understanding

Explore

Professional Services

Products

Document Understanding

Explore

Professional Services

String Similarity Calculation

Create your first automation in just a few minutes.Try Studio Web →

String Similarity Calculation

by Patrick FlÃ¶Ã

Activity

<100

Summary

Calculate the similarity of strings, e.g. Aa / Ab = 50%.

Overview

Calculates Levenshtein distance and puts it relative to the longest string, so you get a value between 0 and 1 that tells you how similar two strings are. You can use this whenever a 100% match is not feasible and a machine learning model is to expensive for the use case.

I also added a normalization and sorting activity, that can be put upfront the string similarity calculation if you want to ignore any non-word characters and / or if you don't care in what order the strings are filled.

I use this to compare addresses from diffferent systems, as users from different systems often type in addresses kind of similar, but not exactly the same. A value above 0.7 is good enough for me, to validate whether addresses match. Every set of data needs to be analyzed separately. A lot of testing is helpful to find a good value.

Activities Details

1. Compare two strings: The core feature is the comparison of two strings and the return of a similarity value.

2. Remove Non-Word Characters: This feature involves eliminating any character from the text that doesn't contribute to its meaning. These may include punctuations, special symbols, numbers (unless they are relevant), and any other non-alphabetic characters. This process helps to simplify the text, making it easier to analyze and process.

3. Convert to Uppercase: This feature translates all the letters in the text to uppercase. It is a commonly used method for normalizing text data, ensuring consistency across different inputs. This function can be particularly useful when case sensitivity is not relevant to the application, ensuring the text can be evaluated uniformly.

4. Sort the String Alphanumeric: This feature involves organizing the content of a string in alphanumeric order. This is a type of sorting where both the numbers (0-9) and the letters (A-Z/a-z) in a string are arranged systematically.

First, all numbers are sorted in ascending order, followed by letters sorted in ascending order (typically, uppercase letters first, then lowercase). For example, the string 'b2a1C3' would become '123Cab'. Sorting the string alphanumeric can be beneficial in a variety of scenarios, such as enhancing data consistency, simplifying search and retrieval processes, and providing a standardized view of the data, especially when dealing with large amounts of text data.

Features

Compare two strings
Normalize strings

Additional Information

Dependencies

UiPath.System.Activities >= 23.4.2

Code Language

Visual Basic

Runtime

Windows (.Net 5.0 or higher)

Publisher

Patrick FlÃ¶Ã

Visit publisher's page

License & Privacy

MIT

Privacy Terms

Technical

Version

1.0.3

Updated

June 2, 2023

Works with

Studio: 23.4.1+

Certification

Silver Certified

Similar Listings

Activity

by Internal Labs

406

This group of custom activities (String Matching Algorithms) serve the purpose of reducing development time for the user intending to implement one or more string matching logic in their project.

Free

Activity

Text Similarity

by Elif Karaoglu

335

This activity finds similarity between two texts and returns similarity percentage.

Free

Activity

Novigo Solutions - String Similarity

by Nithin Prabhu

807

This is an UiPath Custom Activitiy built using String Similarity .NET, a library implementing different string similarity and distance measures

Free

Activity

Cosine Similarity

by Allianz Chandran J S

419

Cosine similarity is a metric used to determine how similar the documents are irrespective of their size

Free