C# 클래스 Accord.MachineLearning.BoltzmannExploration

Boltzmann distribution exploration policy.

The class implements exploration policy base on Boltzmann distribution. Acording to the policy, action a at state s is selected with the next probability:

exp( Q( s, a ) / t ) p( s, a ) = ----------------------------- SUM( exp( Q( s, b ) / t ) ) b

where Q(s, a) is action's a estimation (usefulness) at state s and t is Temperature.

상속: IExplorationPolicy

파일 보기 프로젝트 열기: accord-net/framework

공개 메소드들

메소드	설명
BoltzmannExploration ( double temperature ) : System	Initializes a new instance of the BoltzmannExploration class.
ChooseAction ( double actionEstimates ) : int	Choose an action. The method chooses an action depending on the provided estimates. The estimates can be any sort of estimate, which values usefulness of the action (expected summary reward, discounted reward, etc).

메소드 상세

BoltzmannExploration() 공개 메소드

Initializes a new instance of the BoltzmannExploration class.

public BoltzmannExploration ( double temperature ) : System
temperature	double	Temperature parameter of Boltzmann distribution.
리턴	System

ChooseAction() 공개 메소드

Choose an action.

The method chooses an action depending on the provided estimates. The estimates can be any sort of estimate, which values usefulness of the action (expected summary reward, discounted reward, etc).

public ChooseAction ( double actionEstimates ) : int
actionEstimates	double	Action estimates.
리턴	int