C# Class Lucene.Net.Analysis.In.IndicNormalizer

Normalizes the Unicode representation of text in Indian languages.

Follows guidelines from Unicode 5.2, chapter 6, South Asian Scripts I and graphical decompositions from http://ldc.upenn.edu/myl/IndianScriptsUnicode.html

Datei anzeigen Open project: apache/lucenenet

Public Methods

Method Description
Normalize ( char text, int len ) : int

Normalizes input text, and returns the new length. The length will always be less than or equal to the existing length.

Private Methods

Method Description
Compose ( int ch0, Regex block0, ScriptData sd, char text, int pos, int len ) : int

Compose into standard form any compositions in the decompositions table.

GetBlockForChar ( char c ) : Regex

LUCENENET: Returns the unicode block for the specified character

IndicNormalizer ( ) : Lucene.Net.Analysis.Util

Method Details

Normalize() public method

Normalizes input text, and returns the new length. The length will always be less than or equal to the existing length.
public Normalize ( char text, int len ) : int
text char input text
len int valid length
return int