C# Class ABB.Swum.SamuraiIdSplitter

Used to split the identifiers in a program into their constituent words.
Inheritance: IdSplitter
Show file Open project: abb-iss/Swum.NET Class Usage Examples

Public Methods

Method Description
CountProgramWords ( ISrcMLArchive archive ) : int>.Dictionary

Counts the number of occurrences of words within the identifiers in the given srcml files.

SamuraiIdSplitter ( int>.Dictionary programWordCount ) : System

Creates a new IdentifierSplitter using the specified word count dictionary.

SamuraiIdSplitter ( string programWordCountPath ) : System

Creates a new IdentifierSplitter using the specified program word count file.

Split ( string identifier ) : string[]

Splits a program identifier into its constituent words.

Split ( string identifier, bool printSplitTrace ) : string[]

Splits a program identifier into its constituent words.

Private Methods

Method Description
IncludeIdentifier ( string word, int count ) : bool
Initialize ( int>.Dictionary programWordCount ) : void

Reads the necessary data files and initializes the member variables.

IsPrefix ( string word ) : bool

Checks whether the supplied word is a known prefix.

IsSuffix ( string word ) : bool

Checks whether the supplied word is a known suffix.

Score ( string word ) : double
SplitOnUppercaseToLowercase ( string word ) : string[]

Splits a word where an uppercase letter is followed by a lowercase letter. The word is split only once, at the first matching location. This method assumes the input consists of zero-or-more uppercase letters followed by zero-or-more lowercase letters.

SplitSameCase ( string word ) : string[]

Splits a word into subwords. The word should be either (1) all lowercase, (2) all uppercase, or (3) a single uppercase letter followed by lowercase letters

SplitSameCase ( string word, double noSplitScore ) : string[]

Splits a word into subwords. The word should be either (1) all lowercase, (2) all uppercase, or (3) a single uppercase letter followed by lowercase letters

Method Details

CountProgramWords() public static method

Counts the number of occurrences of words within the identifiers in the given srcml files.
public static CountProgramWords ( ISrcMLArchive archive ) : int>.Dictionary
archive ISrcMLArchive An archive containing the srcml files to analyze.
return int>.Dictionary

SamuraiIdSplitter() public method

Creates a new IdentifierSplitter using the specified word count dictionary.
public SamuraiIdSplitter ( int>.Dictionary programWordCount ) : System
programWordCount int>.Dictionary A dictionary containing the local program word counts.
return System

SamuraiIdSplitter() public method

Creates a new IdentifierSplitter using the specified program word count file.
public SamuraiIdSplitter ( string programWordCountPath ) : System
programWordCountPath string The path to the file containing the local program word counts.
return System

Split() public method

Splits a program identifier into its constituent words.
public Split ( string identifier ) : string[]
identifier string The identifier to split.
return string[]

Split() public method

Splits a program identifier into its constituent words.
public Split ( string identifier, bool printSplitTrace ) : string[]
identifier string The identifier to split.
printSplitTrace bool Whether or not to print a trace of the splitting process.
return string[]