Model-free (reinforcement learning)

In reinforcement learning (RL), a model-free algorithm (as opposed to a model-based one) is an algorithm which does not estimate the transition probability distribution (and the reward function) associated with the Markov decision process (MDP),[1] which, in RL, represents the problem to be solved. The transition probability distribution (or transition model) and the reward function are often collectively called the "model" of the environment (or MDP), hence the name "model-free". A model-free RL algorithm can be thought of as an "explicit" trial-and-error algorithm.[1] An example of a model-free algorithm is Q-learning.

Key 'Model-Free' reinforcement learning algorithms

AlgorithmDescriptionModelPolicyAction SpaceState SpaceOperator
DQN Deep Q NetworkModel-FreeOff-policyDiscreteContinuousQ-value
DDPG Deep Deterministic Policy GradientModel-FreeOff-policyContinuousContinuousQ-value
A3C Asynchronous Advantage Actor-Critic AlgorithmModel-FreeOn-policyContinuousContinuousAdvantage
TRPO Trust Region Policy OptimizationModel-FreeOn-policyContinuous or DiscreteContinuousAdvantage
PPO Proximal Policy OptimizationModel-FreeOn-policyContinuous or DiscreteContinuousAdvantage
TD3 Twin Delayed Deep Deterministic Policy GradientModel-FreeOff-policyContinuousContinuousQ-value
SAC Soft Actor-CriticModel-FreeOff-policyContinuousContinuousAdvantage

References

  1. Sutton, Richard S.; Barto, Andrew G. (November 13, 2018). Reinforcement Learning: An Introduction (PDF) (Second ed.). A Bradford Book. p. 552. ISBN 0262039249. Retrieved 18 February 2019.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.