Method | Description | |
---|---|---|
Cosine ( int>.Dictionary |
To calculate the cosine similarity of two word sequences, we do the steps: step 1: Term Frequency: Term Frequency also known as TF measures the number of times a term (word) occurs in a document. Doing this we fins all the occurences of a given word in the same document. step 2: Normalize the TF in range [0,1] step 3: Find the euclidian norm (length) of Doc1 and Doc2 values step 4: Calculate the dot product of Doc1 elements and Doc2 elements calculate the Cosine sim. by dividing - x dot y / euclidian(doc1)*euclidian(doc2)
|
|
EuclidianDistance ( double firstObservation, double secondObservation ) : double |
The formula to calculate the euclidian distance is: d(i,j) = ( (xi1 + xj1)^2 + (xi2 + xj2)^2 + (xip + xjp)^2 )^1/2 Formmated formula here: http://en.wikipedia.org/wiki/Euclidean_distance
|
|
JaccardDistance ( object firstObservation, object secondObservation ) : double | ||
SMC ( object firstObservation, object secondObservation ) : double |
public Cosine ( int>.Dictionary |
||
firstDocument | int>.Dictionary | Dictionary |
secondDocument | int>.Dictionary | Dictionary |
return | double |
public EuclidianDistance ( double firstObservation, double secondObservation ) : double | ||
firstObservation | double | array of double values /// array of double values |
secondObservation | double | |
return | double |
public JaccardDistance ( object firstObservation, object secondObservation ) : double | ||
firstObservation | object | |
secondObservation | object | |
return | double |
public SMC ( object firstObservation, object secondObservation ) : double | ||
firstObservation | object | |
secondObservation | object | |
return | double |