C# Class Algorithmix.Forensics.OCR

Inheritance: DisposableObject
Mostrar archivo Open project: Algorithmix/Papyrus Class Usage Examples

Public Methods

Method Description
Charactors ( ) : Tesseract.Charactor[]

Returns an array of Tesseract Charactors after running the Scan Method

Cost ( Emgu.CV.OCR.Tesseract chars ) : long

Calculates the cost by summing the unique cost of each word

IsEmpty ( Shred shreds ) : void

Determines runs OCR and determines if a shred is empty or not

OCR ( Accuracy accuracy = Accuracy.High, string language = "eng", bool enableTimer = false ) : System

Initialize a new OCR Object. This object is a wrapper for the Emgu Tesseract Wrapper to give a level of abstraction necessary for scanning shreds

OverallCost ( ) : long

Returns the of cos

ParallelDetectOrientation ( Bitmap regs, Bitmap revs, Accuracy mode = Accuracy.High, string lang = "eng", bool enableTimer = false ) : Tuple[]

Parallelized OCR Orientation Confidence Detector, this method will run ocr on an image and its corresponding reversed images and return the confidence and both the ocrdata objects

ParallelRecognize ( IEnumerable images, int length, Accuracy mode = Accuracy.High, string lang = "eng", bool enableTimer = false ) : OcrData[]

Parallelized Recognize Function takes in a list or array of images, A specified length and for each image returns an OCRData object

Preprocess ( byte>.Image image ) : byte>.Image

OCR Preprocessing, currently this involves binary threholding a gray scaled image using the Otsu Method

Recognize ( Bitmap original, Accuracy mode = Accuracy.High, string lang = "eng", bool enableTimer = false ) : OcrData

Execute OCR on a given image, this static member will process the image, Safely open, execute and dispose a Tesseract Object and store the result in a new OcrData object.

Scan ( byte>.Image image ) : string

Given a Color image, it is coverted to grayscale OCR-ed and returned

Scan ( byte>.Image image ) : string

Invokes Tesseract OCR Recognize on given image Stores the resulting data in the Text,Confidence and ScanTime data members

ShredOcr ( Shred shreds, string lang = "eng" ) : void

Given an array of shreds, we OCR them all and save results to the shred object

StripNewLine ( string text ) : string
Text ( ) : string

Getter for the text generated after running the Scan Method

Protected Methods

Method Description
DisposeObject ( ) : void

Disposes all the necessary objects

Private Methods

Method Description
Elapsed ( ) : long

Retrieve the time elapsed from the diagnostics Timer

Start ( ) : bool

Explicitly starts the diagnostics timer

Stop ( ) : void

Stops the diagnostics timer

Method Details

Charactors() public method

Returns an array of Tesseract Charactors after running the Scan Method
public Charactors ( ) : Tesseract.Charactor[]
return Tesseract.Charactor[]

Cost() public static method

Calculates the cost by summing the unique cost of each word
public static Cost ( Emgu.CV.OCR.Tesseract chars ) : long
chars Emgu.CV.OCR.Tesseract Tesseract OCR Charactor results
return long

DisposeObject() protected method

Disposes all the necessary objects
protected DisposeObject ( ) : void
return void

IsEmpty() public static method

Determines runs OCR and determines if a shred is empty or not
public static IsEmpty ( Shred shreds ) : void
shreds Shred A list of Shred Objects
return void

OCR() public method

Initialize a new OCR Object. This object is a wrapper for the Emgu Tesseract Wrapper to give a level of abstraction necessary for scanning shreds
public OCR ( Accuracy accuracy = Accuracy.High, string language = "eng", bool enableTimer = false ) : System
accuracy Accuracy Desired Accuracy setting
language string Language of text on image used for OCR model
enableTimer bool Set enable Timer to true to measure scan time for diagnostic purposes
return System

OverallCost() public method

Returns the of cos
public OverallCost ( ) : long
return long

ParallelDetectOrientation() public static method

Parallelized OCR Orientation Confidence Detector, this method will run ocr on an image and its corresponding reversed images and return the confidence and both the ocrdata objects
public static ParallelDetectOrientation ( Bitmap regs, Bitmap revs, Accuracy mode = Accuracy.High, string lang = "eng", bool enableTimer = false ) : Tuple[]
regs System.Drawing.Bitmap Images with default regular orientation
revs System.Drawing.Bitmap Images with reversed orientation to default
mode Accuracy OCR accuracy mode
lang string OCR languages
enableTimer bool Enable timer for diagnostic purposes
return Tuple[]

ParallelRecognize() public static method

Parallelized Recognize Function takes in a list or array of images, A specified length and for each image returns an OCRData object
public static ParallelRecognize ( IEnumerable images, int length, Accuracy mode = Accuracy.High, string lang = "eng", bool enableTimer = false ) : OcrData[]
images IEnumerable Array or List of Bitmaps
length int Number of items to be Recognized from the array
mode Accuracy Accuracy Mode
lang string Desired OCR Language
enableTimer bool Enables OCR Scan Timer if true
return OcrData[]

Preprocess() public method

OCR Preprocessing, currently this involves binary threholding a gray scaled image using the Otsu Method
public Preprocess ( byte>.Image image ) : byte>.Image
image byte>.Image Image to be preprocessed
return byte>.Image

Recognize() public static method

Execute OCR on a given image, this static member will process the image, Safely open, execute and dispose a Tesseract Object and store the result in a new OcrData object.
public static Recognize ( Bitmap original, Accuracy mode = Accuracy.High, string lang = "eng", bool enableTimer = false ) : OcrData
original System.Drawing.Bitmap Image to be OCR-ed
mode Accuracy Accuracy setting
lang string Language of text for OCR Language Model
enableTimer bool Measure the Scantime for Diagnostic purposes
return OcrData

Scan() public method

Given a Color image, it is coverted to grayscale OCR-ed and returned
public Scan ( byte>.Image image ) : string
image byte>.Image Source Image to be OCR-ed
return string

Scan() public method

Invokes Tesseract OCR Recognize on given image Stores the resulting data in the Text,Confidence and ScanTime data members
public Scan ( byte>.Image image ) : string
image byte>.Image Source Image to be OCR-ed
return string

ShredOcr() public static method

Given an array of shreds, we OCR them all and save results to the shred object
public static ShredOcr ( Shred shreds, string lang = "eng" ) : void
shreds Shred Array of initialized shred objects
lang string Desired OCR langauge
return void

StripNewLine() public static method

public static StripNewLine ( string text ) : string
text string
return string

Text() public method

Getter for the text generated after running the Scan Method
public Text ( ) : string
return string