C# 클래스 Accord.MachineLearning.Sarsa

Sarsa learning algorithm.

The class provides implementation of Sarsa algorithm, known as on-policy Temporal Difference control.

파일 보기 프로젝트 열기: accord-net/framework 1 사용 예제들

공개 메소드들

메소드	설명
GetAction ( int state ) : int	Get next action from the specified state. The method returns an action according to current exploration policy.
Sarsa ( int states, int actions, IExplorationPolicy explorationPolicy ) : System	Initializes a new instance of the Sarsa class. Action estimates are randomized in the case of this constructor is used.
Sarsa ( int states, int actions, IExplorationPolicy explorationPolicy, bool randomize ) : System	Initializes a new instance of the Sarsa class. The randomize parameter specifies if initial action estimates should be randomized with small values or not. Randomization of action values may be useful, when greedy exploration policies are used. In this case randomization ensures that actions of the same type are not chosen always.
UpdateState ( int previousState, int previousAction, double reward ) : void	Update Q-function's value for the previous state-action pair. Updates Q-function's value for the previous state-action pair in the case if the next state is terminal.
UpdateState ( int previousState, int previousAction, double reward, int nextState, int nextAction ) : void	Update Q-function's value for the previous state-action pair. Updates Q-function's value for the previous state-action pair in the case if the next state is non terminal.

메소드 상세

GetAction() 공개 메소드

Get next action from the specified state.

The method returns an action according to current exploration policy.

public GetAction ( int state ) : int
state	int	Current state to get an action for.
리턴	int

Sarsa() 공개 메소드

Initializes a new instance of the Sarsa class.

Action estimates are randomized in the case of this constructor is used.

public Sarsa ( int states, int actions, IExplorationPolicy explorationPolicy ) : System
states	int	Amount of possible states.
actions	int	Amount of possible actions.
explorationPolicy	IExplorationPolicy	Exploration policy.
리턴	System

Sarsa() 공개 메소드

Initializes a new instance of the Sarsa class.

The randomize parameter specifies if initial action estimates should be randomized with small values or not. Randomization of action values may be useful, when greedy exploration policies are used. In this case randomization ensures that actions of the same type are not chosen always.

public Sarsa ( int states, int actions, IExplorationPolicy explorationPolicy, bool randomize ) : System
states	int	Amount of possible states.
actions	int	Amount of possible actions.
explorationPolicy	IExplorationPolicy	Exploration policy.
randomize	bool	Randomize action estimates or not.
리턴	System

UpdateState() 공개 메소드

Update Q-function's value for the previous state-action pair.

Updates Q-function's value for the previous state-action pair in the case if the next state is terminal.

public UpdateState ( int previousState, int previousAction, double reward ) : void
previousState	int	Curren state.
previousAction	int	Action, which lead from previous to the next state.
reward	double	Reward value, received by taking specified action from previous state.
리턴	void

UpdateState() 공개 메소드

Update Q-function's value for the previous state-action pair.

Updates Q-function's value for the previous state-action pair in the case if the next state is non terminal.

public UpdateState ( int previousState, int previousAction, double reward, int nextState, int nextAction ) : void
previousState	int	Curren state.
previousAction	int	Action, which lead from previous to the next state.
reward	double	Reward value, received by taking specified action from previous state.
nextState	int	Next state.
nextAction	int	Next action.
리턴	void