MarketplaceStudioActivityCosine Similarity

Create your first automation in just a few minutes.Try Studio Web

Cosine Similarity

Cosine Similarity

by Adharsh Chandran J S

2

Activity

Downloads

420

back button
back button
carouselImage0
next button
next button

Summary

Summary

Cosine similarity is a metric used to determine how similar the documents are irrespective of their size

Overview

Overview

A commonly used approach to match similar documents is based on counting the maximum number of common words between the documents.

Still, this approach has an inherent flaw: as the size of the document increases, the number of common words tends to grow even if the documents cover different topics.

The cosine similarity helps overcome this fundamental flaw in the ‘count-the-common-words’ or Euclidean distance approach.

Input:

  • TestingDocumentText - string containing the text content to be tested
  • TrainingDocumentText - string containing the text content to be trained

Output:

  • CosineSimilarityValue - decimal value ranging between [0-1]

Features

Features

The cosine similarity is advantageous because even if the two similar documents are far apart by the Euclidean distance because of the size (like, the word ‘cricket’ appeared 50 times in one document and 10 times in another), they could still have a smaller angle between them. Smaller the angle, higher the similarity.

Additional Information

Additional Information

Dependencies

Centivus.EnglishStemmer.dll

Code Language

Visual Basic

Runtime

Windows Legacy (.Net Framework 4.6.1)

Publisher

Adharsh Chandran J S

Visit publisher's page

License & Privacy

MIT

Privacy Terms

Technical

Version

1.0.1

Updated

April 21, 2020

Works with

Studio: 19.4.4 - 22.10

Certification

Silver Certified

Support

UiPath Community Support

Similar Listings