C# Class SEEUMiner.Library.Similarity

显示文件 Open project: vshehu/SEEUMiner Class Usage Examples

Public Methods

Method Description
Cosine ( int>.Dictionary firstDocument, int>.Dictionary secondDocument ) : double

To calculate the cosine similarity of two word sequences, we do the steps: step 1: Term Frequency: Term Frequency also known as TF measures the number of times a term (word) occurs in a document. Doing this we fins all the occurences of a given word in the same document. step 2: Normalize the TF in range [0,1] step 3: Find the euclidian norm (length) of Doc1 and Doc2 values step 4: Calculate the dot product of Doc1 elements and Doc2 elements calculate the Cosine sim. by dividing - x dot y / euclidian(doc1)*euclidian(doc2)

EuclidianDistance ( double firstObservation, double secondObservation ) : double

The formula to calculate the euclidian distance is: d(i,j) = ( (xi1 + xj1)^2 + (xi2 + xj2)^2 + (xip + xjp)^2 )^1/2 Formmated formula here: http://en.wikipedia.org/wiki/Euclidean_distance

JaccardDistance ( object firstObservation, object secondObservation ) : double
SMC ( object firstObservation, object secondObservation ) : double

Method Details

Cosine() public method

To calculate the cosine similarity of two word sequences, we do the steps: step 1: Term Frequency: Term Frequency also known as TF measures the number of times a term (word) occurs in a document. Doing this we fins all the occurences of a given word in the same document. step 2: Normalize the TF in range [0,1] step 3: Find the euclidian norm (length) of Doc1 and Doc2 values step 4: Calculate the dot product of Doc1 elements and Doc2 elements calculate the Cosine sim. by dividing - x dot y / euclidian(doc1)*euclidian(doc2)
public Cosine ( int>.Dictionary firstDocument, int>.Dictionary secondDocument ) : double
firstDocument int>.Dictionary Dictionary: Pair of word,frequency from Document 1
secondDocument int>.Dictionary Dictionary: Pair of word,frequency from Document 2
return double

EuclidianDistance() public method

The formula to calculate the euclidian distance is: d(i,j) = ( (xi1 + xj1)^2 + (xi2 + xj2)^2 + (xip + xjp)^2 )^1/2 Formmated formula here: http://en.wikipedia.org/wiki/Euclidean_distance
public EuclidianDistance ( double firstObservation, double secondObservation ) : double
firstObservation double array of double values /// array of double values
secondObservation double
return double

JaccardDistance() public method

public JaccardDistance ( object firstObservation, object secondObservation ) : double
firstObservation object
secondObservation object
return double

SMC() public method

public SMC ( object firstObservation, object secondObservation ) : double
firstObservation object
secondObservation object
return double