C# Class Thrinax.NLP.WordSplitter

Mostrar archivo Open project: ziyunhx/thrinax

Public Methods

Method Description
ArrayToString ( string Words ) : string

把词的数组形式拼接为一个string

FilterSpliteTag ( string SpliteTag, HashSet RemainPos, HashSet StopWords = null, int MinLen = 2, int MaxLen = 10 ) : string[]

过滤,仅保留指定词性的词,并去掉词性后缀

Splite ( string Input, bool PosTagged = true ) : string

对传入文本进行分词

SpliteFilter ( string Input, bool PosTagged = false, HashSet RemainPos = null, HashSet StopWords = null, int MinLength, int MaxLength = 10 ) : string

分词函数(返回一个空格分隔的String)

SpliteIntoArray ( string Input, bool PosTagged = false, HashSet RemainPos = null, HashSet StopWords = null, int MinLength, int MaxLength = 10 ) : string[]

分词函数

Method Details

ArrayToString() public static method

把词的数组形式拼接为一个string
public static ArrayToString ( string Words ) : string
Words string
return string

FilterSpliteTag() public static method

过滤,仅保留指定词性的词,并去掉词性后缀
public static FilterSpliteTag ( string SpliteTag, HashSet RemainPos, HashSet StopWords = null, int MinLen = 2, int MaxLen = 10 ) : string[]
SpliteTag string "中国/n","人民/n"
RemainPos HashSet 保留词性
StopWords HashSet 停用词
MinLen int
MaxLen int
return string[]

Splite() public static method

对传入文本进行分词
public static Splite ( string Input, bool PosTagged = true ) : string
Input string 文本
PosTagged bool 是否标注词性
return string

SpliteFilter() public static method

分词函数(返回一个空格分隔的String)
public static SpliteFilter ( string Input, bool PosTagged = false, HashSet RemainPos = null, HashSet StopWords = null, int MinLength, int MaxLength = 10 ) : string
Input string 输入字符串
PosTagged bool 是否标注词性(如果是增加词性后缀如"/n")
RemainPos HashSet 仅保留这些词性的词
StopWords HashSet 禁止词列表(小写)
MinLength int 最短词长度
MaxLength int 最长词长度
return string

SpliteIntoArray() public static method

分词函数
public static SpliteIntoArray ( string Input, bool PosTagged = false, HashSet RemainPos = null, HashSet StopWords = null, int MinLength, int MaxLength = 10 ) : string[]
Input string 输入字符串
PosTagged bool 是否标注词性(如果是增加词性后缀如"/n")
RemainPos HashSet
StopWords HashSet 禁止词列表(小写)
MinLength int 最短词长度
MaxLength int 最长词长度
return string[]