C# Class AmazonScrape.Parser

Contains static methods to process Amazon html and return product information
Show file Open project: ThomasRush/AmazonScrape

Public Methods

Method Description
GetFuzzyPrimeEligibility ( string itemHtml ) : bool

Using an item's html, determines Prime eligibility with passable accuracy.

GetImageThumbnail ( string itemHtml ) : BitmapImage

Parses out the URL to the product's image thumbnail (if one exists) and then calls DownloadWebImage to return a BitmapImage

GetPageResultCount ( string pageHtml ) : int

Given the html of an Amazon search page result, returns the number of product results.

GetPageResultItemHtml ( string pageHtml, int resultCount ) : List

Returns a list of individual html product results from an html page

GetPriceRange ( string itemHtml ) : DoubleRange

Parses a DoubleRange object representing the "high" and "low" prices from the item's html.

GetProductName ( string itemHtml ) : string

Extracts the product's name from a single product's html

GetRating ( string reviewHistogramHtml ) : double

Returns a product's average review rating (double)

GetReviewCount ( string reviewHistogramHtml ) : int

Returns the number of reviews for the product, given the review histogram html (not the product html)

GetReviewHistogramHtml ( string itemHtml ) : string

Given a specific product result html, provides the review histogram html. Used for obtaining review count and review distribution.

GetScoreDistribution ( string reviewHistogramHtml ) : ScoreDistribution

Returns a product's review distribution (percentage of reviews in each category)

GetStrictPrimeEligibility ( Uri productURL ) : bool

Uses an additional page load to determine Prime eligibility with accuracy

GetURL ( string itemHtml ) : Uri

Extracts a product's Amazon URL.

ParseDoubleValues ( string text, int parseCount = -1 ) : List

Finds and returns a list of signed/unsigned integers/doubles parsed from the supplied string. Comma-formatted numbers are recognized.

Private Methods

Method Description
GetMultipleRegExMatches ( string inputString, string regExPattern ) : List

Attempts to match the supplied pattern to the input string. Obtains multiple matches and returns a list of string matches if successful and an empty list of strings if no matches found.

GetSingleRegExMatch ( string inputString, string regExPattern ) : string

Attempts to match the supplied pattern to the input string. Only obtains a single match and returns the matching string if successful and an empty string if not.

Method Details

GetFuzzyPrimeEligibility() public static method

Using an item's html, determines Prime eligibility with passable accuracy.
public static GetFuzzyPrimeEligibility ( string itemHtml ) : bool
itemHtml string
return bool

GetImageThumbnail() public static method

Parses out the URL to the product's image thumbnail (if one exists) and then calls DownloadWebImage to return a BitmapImage
public static GetImageThumbnail ( string itemHtml ) : BitmapImage
itemHtml string
return System.Windows.Media.Imaging.BitmapImage

GetPageResultCount() public static method

Given the html of an Amazon search page result, returns the number of product results.
public static GetPageResultCount ( string pageHtml ) : int
pageHtml string html of entire search page
return int

GetPageResultItemHtml() public static method

Returns a list of individual html product results from an html page
public static GetPageResultItemHtml ( string pageHtml, int resultCount ) : List
pageHtml string The string containing a single page of Amazon search results
resultCount int
return List

GetPriceRange() public static method

Parses a DoubleRange object representing the "high" and "low" prices from the item's html.
public static GetPriceRange ( string itemHtml ) : DoubleRange
itemHtml string
return DoubleRange

GetProductName() public static method

Extracts the product's name from a single product's html
public static GetProductName ( string itemHtml ) : string
itemHtml string Single product result html
return string

GetRating() public static method

Returns a product's average review rating (double)
public static GetRating ( string reviewHistogramHtml ) : double
reviewHistogramHtml string html of the review histogram
return double

GetReviewCount() public static method

Returns the number of reviews for the product, given the review histogram html (not the product html)
public static GetReviewCount ( string reviewHistogramHtml ) : int
reviewHistogramHtml string html for the review histogram
return int

GetReviewHistogramHtml() public static method

Given a specific product result html, provides the review histogram html. Used for obtaining review count and review distribution.
public static GetReviewHistogramHtml ( string itemHtml ) : string
itemHtml string
return string

GetScoreDistribution() public static method

Returns a product's review distribution (percentage of reviews in each category)
public static GetScoreDistribution ( string reviewHistogramHtml ) : ScoreDistribution
reviewHistogramHtml string Review histogram html
return ScoreDistribution

GetStrictPrimeEligibility() public static method

Uses an additional page load to determine Prime eligibility with accuracy
public static GetStrictPrimeEligibility ( Uri productURL ) : bool
productURL System.Uri
return bool

GetURL() public static method

Extracts a product's Amazon URL.
public static GetURL ( string itemHtml ) : Uri
itemHtml string
return System.Uri

ParseDoubleValues() public static method

Finds and returns a list of signed/unsigned integers/doubles parsed from the supplied string. Comma-formatted numbers are recognized.
public static ParseDoubleValues ( string text, int parseCount = -1 ) : List
text string The string to parse
parseCount int The number of double values /// it will attempt to parse
return List