C# Class Accord.MachineLearning.QLearning

QLearning learning algorithm.
The class provides implementation of Q-Learning algorithm, known as off-policy Temporal Difference control.
Afficher le fichier Open project: accord-net/framework Class Usage Examples

Méthodes publiques

Méthode Description
GetAction ( int state ) : int

Get next action from the specified state.

The method returns an action according to current exploration policy.

QLearning ( int states, int actions, IExplorationPolicy explorationPolicy ) : System

Initializes a new instance of the QLearning class.

Action estimates are randomized in the case of this constructor is used.

QLearning ( int states, int actions, IExplorationPolicy explorationPolicy, bool randomize ) : System

Initializes a new instance of the QLearning class.

The randomize parameter specifies if initial action estimates should be randomized with small values or not. Randomization of action values may be useful, when greedy exploration policies are used. In this case randomization ensures that actions of the same type are not chosen always.

UpdateState ( int previousState, int action, double reward, int nextState ) : void

Update Q-function's value for the previous state-action pair.

Method Details

GetAction() public méthode

Get next action from the specified state.
The method returns an action according to current exploration policy.
public GetAction ( int state ) : int
state int Current state to get an action for.
Résultat int

QLearning() public méthode

Initializes a new instance of the QLearning class.
Action estimates are randomized in the case of this constructor is used.
public QLearning ( int states, int actions, IExplorationPolicy explorationPolicy ) : System
states int Amount of possible states.
actions int Amount of possible actions.
explorationPolicy IExplorationPolicy Exploration policy.
Résultat System

QLearning() public méthode

Initializes a new instance of the QLearning class.
The randomize parameter specifies if initial action estimates should be randomized with small values or not. Randomization of action values may be useful, when greedy exploration policies are used. In this case randomization ensures that actions of the same type are not chosen always.
public QLearning ( int states, int actions, IExplorationPolicy explorationPolicy, bool randomize ) : System
states int Amount of possible states.
actions int Amount of possible actions.
explorationPolicy IExplorationPolicy Exploration policy.
randomize bool Randomize action estimates or not.
Résultat System

UpdateState() public méthode

Update Q-function's value for the previous state-action pair.
public UpdateState ( int previousState, int action, double reward, int nextState ) : void
previousState int Previous state.
action int Action, which leads from previous to the next state.
reward double Reward value, received by taking specified action from previous state.
nextState int Next state.
Résultat void