C# Class Thrinax.NLP.KeywordExtracter

提取关键词,通过参数区分已分词和未分词两种调用模式
Mostra file Open project: ziyunhx/thrinax

Public Methods

Method Description
ExtractKeyword ( string paper, int MinSupport = 5, int MaxSupport = 50, bool isSplitter = false ) : KeywordSupport[]

对单篇文章进行关键词提取,当词的数量小于300时使用TextRank,否则使用WordCount

ExtractSingleKeyword ( string Texts, int MinSupport = 5, int MaxSupport = 50, int MinTxtLength = 33, bool isSplitter = false ) : KeywordSupport[]

提取关键词(单词)

Private Methods

Method Description
TextRankExtract ( string spliteWords ) : int>.Dictionary

使用TextRank算法来对关键词打分,在原有算法的得分上乘以13来和统计得分相接近

WordCountExtract ( string spliteWords ) : int>.Dictionary

针对词频进行统计来对关键词进行排序

Method Details

ExtractKeyword() public static method

对单篇文章进行关键词提取,当词的数量小于300时使用TextRank,否则使用WordCount
public static ExtractKeyword ( string paper, int MinSupport = 5, int MaxSupport = 50, bool isSplitter = false ) : KeywordSupport[]
paper string 文本或者分词后结果
MinSupport int 最小得分
MaxSupport int 最大得分
isSplitter bool 是否分词
return KeywordSupport[]

ExtractSingleKeyword() public static method

提取关键词(单词)
public static ExtractSingleKeyword ( string Texts, int MinSupport = 5, int MaxSupport = 50, int MinTxtLength = 33, bool isSplitter = false ) : KeywordSupport[]
Texts string 文本集,未进行分词
MinSupport int 所选词的最小支持度
MaxSupport int 所选词的最大支持度
MinTxtLength int 最小分词长度
isSplitter bool 是否已分词
return KeywordSupport[]