MarketplaceListingsCustom ActivityACP Digital - Advanced String Manipulation

ACP Digital - Advanced String Manipulation

Bronze Certified

Custom Activity

1 reviews

7

ACP Digital - Advanced String Manipulation

Bronze Certified

Custom Activity

1 reviews

7


Summary

A few activities to help with string manipulation: get middle Substring with Strings, Fuzzy Search in Array, check if Strings are similar.

carouselImage0

Overview

A detailed description of activities:

1. Get middle Substring with Strings
This activity is very useful when you receive string data in a partly structured manner and want to extract specific information.
Our workflow for which it was created used a Read PDF activity that returned a string, from which we had to extract IDs, names and other substrings, that we did not know the length and exact location of. By only knowing what is before and after the string you can get the exact string using this activity.

Example:
Let's say you extracted the following String from a PDF using OCR or just Read PDF:
"City: New York Country: USA Company Name: A very Good Company Inc. Department: IT Internal ID: 5566655"
If it's always similar it's quite easy to get the data, but since we don‘t know the length of any of the values it might be difficult to get them extracted.
The simplest solution would be to get the index of the text preceding the desired value and the Index after the desired value then use the Substring to extract what we are looking for. This is exactly what this activity does.
Given the Full String, the Start String, and the End String, it will calculate from where to where your text should be extracted to get the value.


(See included Screenshot)

2. Fuzzy Search in Array

The Levenshtein distance between two strings is the minimum number d single-character edits (insertions, deletions, or substitutions) required, to change one string into the other.
This means, the higher the Levenshtein distance, the more changes are needed, thus the strings are less similar.

This Activity calculates the Levenshtein Distance between the given String and every string in the given array, using the Levenshtein Distance Algorithm-String Comparison, developed by Kumari Ekankika, to find the string in the array, that is most similar to the given string.

Link to Activity developed by Kumari Ekankika: https://connect.uipath.com/marketplace/components/levenshtein-distance-algorithm-string-comparison-f2ec21

Originally this activity was developed because some data the bot received had to be compared to a database. Unfortunately, the received data often had typos and errors, but still needed to be matched to the database. And the Levenshtein Distance was the easiest solution to this, without resorting to AI or any advanced recognition techniques.

Example:
OCR data retrieved from a PDF the value is: "Gogle"
Data array retrievd from SQL: {"Yahoo","Apple", "UiPath", "ACP", "Google", "Microsoft"}

The desired match score is an integer value (1-100) that is an input parameter of the activity. It is used to make sure that the best matching string in the array is still close enough to the given string. If the difference between the given String and all the strings in the array is bigger than the given DesiredMatchScore, a „No Match!“ String is returned.
Normally trying to match these two would return no value, since "Gogle" does not exist in the array. But we can clearly see that it was just a typo and that "Google" does exist in the array.

Using the Fuzzy Search in Array Activity fixes this problem.


(See attached screenshot)

3. Check if Strings are similar

Same as the Fuzzy Search activity, this one relies on the Levenshtein distance to check if two given strings are similar.

This activity was created because every time we used the Levenshtein Algorithm activity we had to then convert the returned value to a boolean, based on our specific desired value. This automates that process and returns the boolean we were looking for.

Very useful when trying to mitigate typos, minor mistakes, or bad OCR recognition in a text when doing any comparisons.

The input parameters are:
DesiredMachPercentage (int): A value from 1 -100, if the strings overlap percentage is higher then the desired value, the activity will return StringsAreSimilar=True
String1 (string): The first string to compare
String2 (string): The second string to compare
The output parameters are:
FoundMatchPercentage (int): The percentage match of the strings
StringsAreSimilar (boolean): A true/false value telling us if the strings are similar or not
(See attached screenshot)

The activities have been developed using the following Custom Activities from the UiPath Marketplace: Levenshtein Distance Algorithm-String Comparison, by Kumari Ekankika (https://connect.uipath.com/marketplace/components/levenshtein-distance-algorithm-string-comparison-f2ec21)

Benefits


Published: 13 Jan 2021 | Updated: 13 Jan 2021


License

MIT


Code Language

Visual Basic


Runtime

.NET Framework


Tags

string
compare
manipulation
digital
array
ACP
levenshtein
Algorith

Compatibility

UiPath Studio 2020.10


Dependencies

Mostly just the regular UiPath dependencies are used. There are also a few uses of the following Custom Activites from the Marketplace: Levenshtein Distance Algorithm-String Comparison, by Kumari Ekankika (https://connect.uipath.com/marketplace/components/levenshtein-distance-algorithm-string-comparison-f2ec21)


Similar Listings