C# Класс Accord.MachineLearning.BoltzmannExploration

Boltzmann distribution exploration policy.

The class implements exploration policy base on Boltzmann distribution. Acording to the policy, action a at state s is selected with the next probability:

exp( Q( s, a ) / t ) p( s, a ) = ----------------------------- SUM( exp( Q( s, b ) / t ) ) b

where Q(s, a) is action's a estimation (usefulness) at state s and t is Temperature.

Наследование: IExplorationPolicy
Показать файл Открыть проект

Открытые методы

Метод Описание
BoltzmannExploration ( double temperature ) : System

Initializes a new instance of the BoltzmannExploration class.

ChooseAction ( double actionEstimates ) : int

Choose an action.

The method chooses an action depending on the provided estimates. The estimates can be any sort of estimate, which values usefulness of the action (expected summary reward, discounted reward, etc).

Описание методов

BoltzmannExploration() публичный Метод

Initializes a new instance of the BoltzmannExploration class.
public BoltzmannExploration ( double temperature ) : System
temperature double Temperature parameter of Boltzmann distribution.
Результат System

ChooseAction() публичный Метод

Choose an action.
The method chooses an action depending on the provided estimates. The estimates can be any sort of estimate, which values usefulness of the action (expected summary reward, discounted reward, etc).
public ChooseAction ( double actionEstimates ) : int
actionEstimates double Action estimates.
Результат int