C# Class StopGuessing.DataStructures.Sketch

A sketch is a probabilistic data structure that provides an approximate count of the number of times an item has been observed or, more generally, the sum of a set of numbers associated with each item (where the common case is that each observation of an item is the number 1). One can add observations of items (or numbers associated with items) and get an estimate of that number. A sketch always returns a number that is _at least_ as large as the number observed (up to the maximum number that the sketch can store). In other words, it is a lower bound on the values observed. It may falsely return a number too large, but will never return a number too small (unless, again, the value has exceeded the maximum integer the sketch is designe do store.)
Mostrar archivo Open project: Microsoft/StopGuessing

Public Methods

Method Description
Add ( string s, ulong amountToAdd = 1 ) : ResultOfUpdate

An Add function that is equivalent to calling increment multiple (amountToAdd) times. If adding would cause the value to exceed MaxValue, MaxValue will be stored instead.

ConservativeAdd ( string s, ulong amountToAdd = 1 ) : ResultOfUpdate

An Add function that is equivalent to calling conservativeIncrement multiple (amountToAdd) times. If adding would cause the value to exceed MaxValue, MaxValue will be stored instead.

ConservativeIncrement ( string s ) : ResultOfUpdate

Count the observation of the string s by adding one to those values indexed by calling getIndexesForString(s) for which the value is currently equal to getMin(s). This algorithm is more conservative than increment in that it is less likely to result in overcounting. If the count is already MaxValue, the operation will have no effect.

Get ( long elementIndexes ) : ResultOfGet

Get the current state of the sketch for a value's row indexes into each column.

Get ( string s ) : ResultOfGet

Get the current state of the sketch for a value's row indexes into each column.

GetColumnTotal ( int column ) : ulong

Get the total value stored in a columnn

GetMin ( string s ) : ulong

Get the minimum of all the values at the indexes identified for the string s with the sketch. This is a lower bound on the number of times the string s has been witnessed by the sketch via a call to increment(s) and/or conservativeIncrement(s). However, counting stops when the maximum value storable in the sketch is reached.

Increment ( string s ) : ResultOfUpdate

Count the observation of the string s by adding one to the value in each table indexed by calling getIndexesForString(s) If the count is already MaxValue, the operation will have no effect.

IsNonZero ( string s ) : bool

Test whether a string s has never be witnessed before by seeing if getMin(s) > 0. This is guaranteed to return true if increment(s) or conservativeIncrement(s) has been called on the sketch. It is probabilistically likely, but not guaranteed, to return false neither increment(s) or conservativeIncrement(s) have been called.

Sketch ( long numberOfColumns, long numberOfRows, int bitsPerElement ) : System

A sketch can be viewed as either having k tables (one for each hash) of n elements or as a two-dimensional array of k columns * n rows. This constructor creates a Sketch of size specified by k, n, and the number of bits per element.

this ( long elementIndexes ) : ulong

A shortcut to set all the elements of the sketch identified by an array of indexes, or to Get the minimum of all the values at the indexes identified by the array of indexes.

this ( string s ) : ulong

A shortcut to set all the elements of the sketch identified by indexes for the string s or to Get the minimum of all the values at the indexes identified for the string s (equivalent to calling getMin(s).

Protected Methods

Method Description
Add ( long elementIndexForEachColumn, ulong amountToAdd = 1 ) : ResultOfUpdate

Add to the the value at each [column,index] pair, where the index into elementIndexForEachColumnis the column number and the value is the row (element) index. If adding would cause the value to exceed MaxValue, MaxValue will be stored instead.

ConservativeAdd ( long elementIndexForEachColumn, ulong amountToAdd = 1 ) : ResultOfUpdate

Use the conservative algorithm to Add to the data structure such that for each [column][row], the value stored is no less than getMin(elementIndexForEachColumn) + amountToAdd (but never exceeding MaxValue).

GetIndexesForString ( string s ) : long[]

Given a string s, provides an index into each column of the sketch. Each index identifies a row (element) to which one can write (if adding members/counts) or read (if testing membership or min counts). SECURITY NOTE: This indexes can be predicted by an adversary. If using the sketch in a scenario in which an adversary may cause harm by targeting certain indexes, include as a prefix of the parameter s a key that is not publicly-known (and hopefully can be kept secret from any adversaries). In other words, ulong[] indexesIntoSketch = getIndexesForString( key + s ).

Read ( long column, long row ) : ulong

Read from the underlying sketch data structure (two-dimensional array) at column (Table) column and row (element) row.

Write ( long column, long row, ulong value ) : void

WriteAccountToStableStoreAsync value to the underlying sketch data structure (two-dimensional array) at column (Table) column and row (element) row.

Method Details

Add() protected method

Add to the the value at each [column,index] pair, where the index into elementIndexForEachColumnis the column number and the value is the row (element) index. If adding would cause the value to exceed MaxValue, MaxValue will be stored instead.
protected Add ( long elementIndexForEachColumn, ulong amountToAdd = 1 ) : ResultOfUpdate
elementIndexForEachColumn long An array for which the index is a columnn number and the value /// is the row (element) index.
amountToAdd ulong The amount to Add to each of the indexed elements.
return ResultOfUpdate

Add() public method

An Add function that is equivalent to calling increment multiple (amountToAdd) times. If adding would cause the value to exceed MaxValue, MaxValue will be stored instead.
public Add ( string s, ulong amountToAdd = 1 ) : ResultOfUpdate
s string The string to witness.
amountToAdd ulong The number of times to witness it. /// that would result before the Add.
return ResultOfUpdate

ConservativeAdd() protected method

Use the conservative algorithm to Add to the data structure such that for each [column][row], the value stored is no less than getMin(elementIndexForEachColumn) + amountToAdd (but never exceeding MaxValue).
protected ConservativeAdd ( long elementIndexForEachColumn, ulong amountToAdd = 1 ) : ResultOfUpdate
elementIndexForEachColumn long An array for which the index is a columnn number and the value /// is the row (element) index.
amountToAdd ulong The desired increase such that getMin() /// after the operation returns is + getMin() /// (unless doing so would cause the result to exceed MaxValue).
return ResultOfUpdate

ConservativeAdd() public method

An Add function that is equivalent to calling conservativeIncrement multiple (amountToAdd) times. If adding would cause the value to exceed MaxValue, MaxValue will be stored instead.
public ConservativeAdd ( string s, ulong amountToAdd = 1 ) : ResultOfUpdate
s string The string to witness.
amountToAdd ulong The number of times to witness it. /// that would result before the Add.
return ResultOfUpdate

ConservativeIncrement() public method

Count the observation of the string s by adding one to those values indexed by calling getIndexesForString(s) for which the value is currently equal to getMin(s). This algorithm is more conservative than increment in that it is less likely to result in overcounting. If the count is already MaxValue, the operation will have no effect.
public ConservativeIncrement ( string s ) : ResultOfUpdate
s string The string to witness.
return ResultOfUpdate

Get() public method

Get the current state of the sketch for a value's row indexes into each column.
public Get ( long elementIndexes ) : ResultOfGet
elementIndexes long The row indexes into each column of the sketch
return ResultOfGet

Get() public method

Get the current state of the sketch for a value's row indexes into each column.
public Get ( string s ) : ResultOfGet
s string The string to query the sketch for occurrence/frequency information.
return ResultOfGet

GetColumnTotal() public method

Get the total value stored in a columnn
public GetColumnTotal ( int column ) : ulong
column int The column
return ulong

GetIndexesForString() protected method

Given a string s, provides an index into each column of the sketch. Each index identifies a row (element) to which one can write (if adding members/counts) or read (if testing membership or min counts). SECURITY NOTE: This indexes can be predicted by an adversary. If using the sketch in a scenario in which an adversary may cause harm by targeting certain indexes, include as a prefix of the parameter s a key that is not publicly-known (and hopefully can be kept secret from any adversaries). In other words, ulong[] indexesIntoSketch = getIndexesForString( key + s ).
protected GetIndexesForString ( string s ) : long[]
s string A string to map to indexes pointing into each column (table) of the sketch. /// If used in adversarial contexts, create a key string and include /// prefix all calls to this method with that key. ///
return long[]

GetMin() public method

Get the minimum of all the values at the indexes identified for the string s with the sketch. This is a lower bound on the number of times the string s has been witnessed by the sketch via a call to increment(s) and/or conservativeIncrement(s). However, counting stops when the maximum value storable in the sketch is reached.
public GetMin ( string s ) : ulong
s string The string to query for the minimum occurrence count of.
return ulong

Increment() public method

Count the observation of the string s by adding one to the value in each table indexed by calling getIndexesForString(s) If the count is already MaxValue, the operation will have no effect.
public Increment ( string s ) : ResultOfUpdate
s string The string to witness.
return ResultOfUpdate

IsNonZero() public method

Test whether a string s has never be witnessed before by seeing if getMin(s) > 0. This is guaranteed to return true if increment(s) or conservativeIncrement(s) has been called on the sketch. It is probabilistically likely, but not guaranteed, to return false neither increment(s) or conservativeIncrement(s) have been called.
public IsNonZero ( string s ) : bool
s string The string to test.
return bool

Read() protected method

Read from the underlying sketch data structure (two-dimensional array) at column (Table) column and row (element) row.
protected Read ( long column, long row ) : ulong
column long The column (table) to read from.
row long The row (element) within the table (column).
return ulong

Sketch() public method

A sketch can be viewed as either having k tables (one for each hash) of n elements or as a two-dimensional array of k columns * n rows. This constructor creates a Sketch of size specified by k, n, and the number of bits per element.
public Sketch ( long numberOfColumns, long numberOfRows, int bitsPerElement ) : System
numberOfColumns long The number of columns in the sketch, which is equivalent to the number of tables (one table per hash index).
numberOfRows long The number of rows, which is equivalent to the number of elements per table.
bitsPerElement int The size of each element, in bits, such that the maximum value that can be stored (MaxValue) /// in the sketch elements is 2^(n)-1.
return System

Write() protected method

WriteAccountToStableStoreAsync value to the underlying sketch data structure (two-dimensional array) at column (Table) column and row (element) row.
protected Write ( long column, long row, ulong value ) : void
column long The column (table) to write to.
row long The row (element) within the column (table).
value ulong The value to write.
return void

this() public method

A shortcut to set all the elements of the sketch identified by an array of indexes, or to Get the minimum of all the values at the indexes identified by the array of indexes.
public this ( long elementIndexes ) : ulong
elementIndexes long An array of indexes into the sketch, one for each column (table).
return ulong

this() public method

A shortcut to set all the elements of the sketch identified by indexes for the string s or to Get the minimum of all the values at the indexes identified for the string s (equivalent to calling getMin(s).
public this ( string s ) : ulong
s string A string to map to indexes pointing into each column (table) of the sketch.
return ulong