Cosine Similarity

Bronze Certified

Custom Activity

2 reviews

385

Cosine Similarity

Bronze Certified

Custom Activity

2 reviews

385


Summary

Cosine similarity is a metric used to determine how similar the documents are irrespective of their size

carouselImage0

Overview

A commonly used approach to match similar documents is based on counting the maximum number of common words between the documents.
Still, this approach has an inherent flaw: as the size of the document increases, the number of common words tends to grow even if the documents cover different topics.
The cosine similarity helps overcome this fundamental flaw in the ‘count-the-common-words’ or Euclidean distance approach.
Input:
  • TestingDocumentText - string containing the text content to be tested
  • TrainingDocumentText - string containing the text content to be trained
Output:
  • CosineSimilarityValue - decimal value ranging between [0-1]

Benefits


Published: 14 Jan 2020 | Updated: 18 Dec 2020

Adharsh Chandran J S
Senior Software Engineer

Trivandrum, Kerala, India


License

MIT


Code Language

Visual Basic


Runtime

.NET Framework


Tags

data
file
processing
recognition
cosine similiarity
calculate cosine similiarity
cosine

Compatibility

Developed in 2019.4.4


Dependencies

Centivus.EnglishStemmer.dll


Similar Listings