当前位置: 代码迷 >> 综合 >> Reinforcement Learning(四):Actor-Critic Methods
  详细解决方案

Reinforcement Learning(四):Actor-Critic Methods

热度:14   发布时间:2023-12-12 01:06:30.0

主要思想:


Policy Network (Actor)

Value Network (Critic):

形象对比:


Train the Neural Networks

具体步骤:

Update value network q using TD

Update policy network Π using policy gradient


Actor-Critic Method

Summary of Algorithm


Summary

Policy Network and Value Network

Training

  相关解决方案