C# Класс AForge.MachineLearning.QLearning

QLearning learning algorithm.

The class provides implementation of Q-Learning algorithm, known as off-policy Temporal Difference control.

Показать файл Открыть проект Примеры использования класса

Открытые методы

Метод	Описание
GetAction ( int state ) : int	Get next action from the specified state. The method returns an action according to current exploration policy.
QLearning ( int states, int actions, IExplorationPolicy explorationPolicy ) : System	Initializes a new instance of the QLearning class. Action estimates are randomized in the case of this constructor is used.
QLearning ( int states, int actions, IExplorationPolicy explorationPolicy, bool randomize ) : System	Initializes a new instance of the QLearning class. The randomize parameter specifies if initial action estimates should be randomized with small values or not. Randomization of action values may be useful, when greedy exploration policies are used. In this case randomization ensures that actions of the same type are not chosen always.
UpdateState ( int previousState, int action, double reward, int nextState ) : void	Update Q-function's value for the previous state-action pair.

Описание методов

GetAction() публичный Метод

Get next action from the specified state.

The method returns an action according to current exploration policy.

public GetAction ( int state ) : int
state	int	Current state to get an action for.
Результат	int

QLearning() публичный Метод

Initializes a new instance of the QLearning class.

Action estimates are randomized in the case of this constructor is used.

public QLearning ( int states, int actions, IExplorationPolicy explorationPolicy ) : System
states	int	Amount of possible states.
actions	int	Amount of possible actions.
explorationPolicy	IExplorationPolicy	Exploration policy.
Результат	System

QLearning() публичный Метод

Initializes a new instance of the QLearning class.

The randomize parameter specifies if initial action estimates should be randomized with small values or not. Randomization of action values may be useful, when greedy exploration policies are used. In this case randomization ensures that actions of the same type are not chosen always.

public QLearning ( int states, int actions, IExplorationPolicy explorationPolicy, bool randomize ) : System
states	int	Amount of possible states.
actions	int	Amount of possible actions.
explorationPolicy	IExplorationPolicy	Exploration policy.
randomize	bool	Randomize action estimates or not.
Результат	System

UpdateState() публичный Метод

Update Q-function's value for the previous state-action pair.

public UpdateState ( int previousState, int action, double reward, int nextState ) : void
previousState	int	Previous state.
action	int	Action, which leads from previous to the next state.
reward	double	Reward value, received by taking specified action from previous state.
nextState	int	Next state.
Результат	void