Cosine ( int>.Dictionary firstDocument, int>.Dictionary secondDocument ) : double |
To calculate the cosine similarity of two word sequences, we do the steps: step 1: Term Frequency: Term Frequency also known as TF measures the number of times a term (word) occurs in a document. Doing this we fins all the occurences of a given word in the same document. step 2: Normalize the TF in range [0,1] step 3: Find the euclidian norm (length) of Doc1 and Doc2 values step 4: Calculate the dot product of Doc1 elements and Doc2 elements calculate the Cosine sim. by dividing - x dot y / euclidian(doc1)*euclidian(doc2) |
|
EuclidianDistance ( double firstObservation, double secondObservation ) : double |
The formula to calculate the euclidian distance is: d(i,j) = ( (xi1 + xj1)^2 + (xi2 + xj2)^2 + (xip + xjp)^2 )^1/2 Formmated formula here: http://en.wikipedia.org/wiki/Euclidean_distance |
|