详细解决方案
Reinforcement Learning(四):Actor-Critic Methods
热度:14 发布时间:2023-12-12 01:06:30.0
主要思想:
Policy Network (Actor)
Value Network (Critic):
形象对比:
Train the Neural Networks
具体步骤:
Update value network q using TD
Update policy network Π using policy gradient
Actor-Critic Method
Summary of Algorithm
Summary
Policy Network and Value Network
Training