C# Класс SEEUMiner.Library.Similarity

Показать файл Открыть проект Примеры использования класса

Открытые методы

Метод Описание
Cosine ( int>.Dictionary firstDocument, int>.Dictionary secondDocument ) : double

To calculate the cosine similarity of two word sequences, we do the steps: step 1: Term Frequency: Term Frequency also known as TF measures the number of times a term (word) occurs in a document. Doing this we fins all the occurences of a given word in the same document. step 2: Normalize the TF in range [0,1] step 3: Find the euclidian norm (length) of Doc1 and Doc2 values step 4: Calculate the dot product of Doc1 elements and Doc2 elements calculate the Cosine sim. by dividing - x dot y / euclidian(doc1)*euclidian(doc2)

EuclidianDistance ( double firstObservation, double secondObservation ) : double

The formula to calculate the euclidian distance is: d(i,j) = ( (xi1 + xj1)^2 + (xi2 + xj2)^2 + (xip + xjp)^2 )^1/2 Formmated formula here: http://en.wikipedia.org/wiki/Euclidean_distance

JaccardDistance ( object firstObservation, object secondObservation ) : double
SMC ( object firstObservation, object secondObservation ) : double

Описание методов

Cosine() публичный Метод

To calculate the cosine similarity of two word sequences, we do the steps: step 1: Term Frequency: Term Frequency also known as TF measures the number of times a term (word) occurs in a document. Doing this we fins all the occurences of a given word in the same document. step 2: Normalize the TF in range [0,1] step 3: Find the euclidian norm (length) of Doc1 and Doc2 values step 4: Calculate the dot product of Doc1 elements and Doc2 elements calculate the Cosine sim. by dividing - x dot y / euclidian(doc1)*euclidian(doc2)
public Cosine ( int>.Dictionary firstDocument, int>.Dictionary secondDocument ) : double
firstDocument int>.Dictionary Dictionary: Pair of word,frequency from Document 1
secondDocument int>.Dictionary Dictionary: Pair of word,frequency from Document 2
Результат double

EuclidianDistance() публичный Метод

The formula to calculate the euclidian distance is: d(i,j) = ( (xi1 + xj1)^2 + (xi2 + xj2)^2 + (xip + xjp)^2 )^1/2 Formmated formula here: http://en.wikipedia.org/wiki/Euclidean_distance
public EuclidianDistance ( double firstObservation, double secondObservation ) : double
firstObservation double array of double values /// array of double values
secondObservation double
Результат double

JaccardDistance() публичный Метод

public JaccardDistance ( object firstObservation, object secondObservation ) : double
firstObservation object
secondObservation object
Результат double

SMC() публичный Метод

public SMC ( object firstObservation, object secondObservation ) : double
firstObservation object
secondObservation object
Результат double