C# Class Lucene.Net.Analysis.In.IndicNormalizer

Normalizes the Unicode representation of text in Indian languages.

Follows guidelines from Unicode 5.2, chapter 6, South Asian Scripts I and graphical decompositions from http://ldc.upenn.edu/myl/IndianScriptsUnicode.html

Datei anzeigen Open project: apache/lucenenet

Public Methods

Method	Description
Normalize ( char text, int len ) : int	Normalizes input text, and returns the new length. The length will always be less than or equal to the existing length.

Private Methods

Method	Description
Compose ( int ch0, Regex block0, ScriptData sd, char text, int pos, int len ) : int	Compose into standard form any compositions in the decompositions table.
GetBlockForChar ( char c ) : Regex	LUCENENET: Returns the unicode block for the specified character
IndicNormalizer ( ) : Lucene.Net.Analysis.Util

Method Details

Normalize() public method

Normalizes input text, and returns the new length. The length will always be less than or equal to the existing length.

public Normalize ( char text, int len ) : int
text	char	input text
len	int	valid length
return	int